Dear Maintainers,

I would like to provide some additional background for this patchset.

We observed a high-probability crash on an Android device running a
6.1.145-based kernel when recording preemptirq tracepoints for a user
space process with dwarf callchains enabled.

The command used to reproduce the issue is:

   simpleperf record -p <PID> -f 10000 \
     -e preemptirq:preempt_disable \
     -e preemptirq:preempt_enable \
     --duration 9 --call-graph dwarf \
     -o /data/local/tmp/perf.data

Here <PID> is the PID of a user space process, for example a foreground
application UI thread or RenderThread.

One important observation is that the crash does not reproduce if
"--call-graph dwarf" is removed.

The crash log shows a data abort on a user virtual address while the PC
is at a probed kernel instruction:

   [  297.177775] Unable to handle kernel paging request at virtual 
address 0000007ff042e000
   [  297.177792] Mem abort info:
   [  297.177795]   ESR = 0x0000000096000007
   [  297.177799]   EC = 0x25: DABT (current EL), IL = 32 bits
   [  297.177803]   SET = 0, FnV = 0
   [  297.177806]   EA = 0, S1PTW = 0
   [  297.177808]   FSC = 0x07: level 3 translation fault
   [  297.177811] Data abort info:
   [  297.177814]   ISV = 0, ISS = 0x00000007
   [  297.177817]   CM = 0, WnR = 0
   [  297.177820] user pgtable: 4k pages, 39-bit VAs, pgdp=000000098c9f2000
   [  297.177825] [0000007ff042e000] pgd=08000009aaaea003, 
p4d=08000009aaaea003, pud=08000009aaaea003, pmd=08000000abca0003, 
pte=0000000000000000
   [  297.177835] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
   [  297.178070] Skip md ftrace buffer dump for: 0x2800d70
   ...
   [  297.178485] CPU: 6 PID: 10214 Comm: id.article.news Tainted: P S 
    W  O       6.1.145-android14-11-maybe-dirty-qki-consolidate #1
   [  297.178489] Hardware name: Qualcomm Technologies, Inc. Volcano 
QRD,x6878 (DT)
   [  297.178491] pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS 
BTYPE=--)
   [  297.178493] pc : folio_wait_bit_common+0x0/0x408
   [  297.178499] lr : perf_output_sample+0x57c/0xacc
   [  297.178502] sp : ffffffc0366c2f90
   [  297.178503] x29: ffffffc0366c2fb0 x28: 0000000000001000 x27: 
0000007ff042d5f8
   [  297.178507] x26: 00000000000035e7 x25: 0000000000000000 x24: 
ffffff892cec3000
   [  297.178510] x23: 0000000000001000 x22: 0000000000009370 x21: 
ffffffc0366c3140
   [  297.178512] x20: ffffff888aa1a180 x19: ffffffc0366c3020 x18: 
ffffffe01103b340
   [  297.178515] x17: 00000000ad6b63b6 x16: 00000000ad6b63b6 x15: 
0000007ff042d5f8
   [  297.178518] x14: 0000000000000000 x13: 003436737365636f x12: 
72705f7070612f6e
   [  297.178520] x11: 69622f6d65747379 x10: 732f0030333d7972 x9 : 
616d6972705f6c6f
   [  297.178523] x8 : 6f705f706173755f x7 : 54454b434f535f44 x6 : 
ffffff892cec39d8
   [  297.178526] x5 : ffffff892cec4000 x4 : 0000000000000008 x3 : 
6e6f6973736e6172
   [  297.178528] x2 : 00000000000005b8 x1 : 0000007ff042e000 x0 : 
ffffff892cec3000
   [  297.178531] Call trace:
   [  297.178532]  folio_wait_bit_common+0x0/0x408
   [  297.178535]  perf_event_output_forward+0x90/0xdc
   [  297.178537]  __perf_event_overflow+0x128/0x1e8
   [  297.178540]  perf_swevent_event+0x94/0x1a0
   [  297.178543]  perf_tp_event+0x140/0x270
   [  297.178545]  perf_trace_run_bpf_submit+0x84/0xe0
   [  297.178547]  perf_trace_preemptirq_template+0xe8/0x124
   [  297.178553]  trace_preempt_on+0xec/0x150
   [  297.178555]  preempt_count_sub+0xa8/0x12c
   [  297.178562]  do_debug_exception+0xd0/0x148
   [  297.178568]  el1_dbg+0x64/0x80
   [  297.178575]  el1h_64_sync_handler+0x3c/0x90
   [  297.178577]  el1h_64_sync+0x68/0x6c
   [  297.178579]  folio_wait_bit_common+0x0/0x408
   [  297.178582]  __get_node_page+0xdc/0x49c
   [  297.178587]  f2fs_get_dnode_of_data+0x404/0x950
   [  297.178589]  f2fs_map_blocks+0x1e0/0xdf8
   [  297.178591]  f2fs_mpage_readpages+0x1f0/0x8d0
   [  297.178594]  f2fs_readahead+0x84/0x10c
   [  297.178596]  read_pages+0xb8/0x434
   [  297.178603]  page_cache_ra_unbounded+0x9c/0x2f0
   [  297.178605]  page_cache_ra_order+0x2b0/0x348
   [  297.178608]  do_sync_mmap_readahead+0xd0/0x228
   [  297.178612]  filemap_fault+0x158/0x46c
   [  297.178615]  f2fs_filemap_fault+0x28/0x114
   [  297.178617]  handle_mm_fault+0x4f8/0x1468
   [  297.178620]  do_page_fault+0x208/0x4b8
   [  297.178622]  do_translation_fault+0x38/0x54
   [  297.178624]  do_mem_abort+0x58/0x118
   [  297.178626]  el0_da+0x48/0xb8
   [  297.178629]  el0t_64_sync_handler+0x98/0xb4
   [  297.178632]  el0t_64_sync+0x1a4/0x1a8
   [  297.178634] Code: 94000004 a8c17bfd d50323bf d65f03c0 (d4200080)
   [  297.178639] ---[ end trace 0000000000000000 ]---

The instruction d4200080 is the kprobe BRK instruction. The stack also
shows that the fault happens while handling a kprobe debug exception, 
and the perf/trace path is entered from that window.

 From the fulldump analysis, the issue appears to be related to the arm64
kprobe single-step/reentry handling. While a kprobe is preparing or
executing its XOL single-step instruction, perf/trace code may run in 
the same window. With dwarf callchains enabled, this path may also 
access user memory and take a data abort. In addition, another kprobe 
may be hit while the first kprobe is still in KPROBE_HIT_SS state.

This matches the type of issue that was fixed on x86 by the following
commits:

   6381c24cd6d5 ("kprobes/x86: Fix page-fault handling logic")
   6a5022a56ac3 ("kprobes/x86: Allow to handle reentered kprobe on 
single-stepping")

This patchset applies the same idea to arm64:

   - Patch 1 makes the arm64 kprobe fault handler handle a fault in
     KPROBE_HIT_SS/KPROBE_REENTER only when the faulting PC is the current
     kprobe's XOL instruction. Otherwise, the fault is left to the normal
     fault handling path.

   - Patch 2 allows a kprobe hit in KPROBE_HIT_SS to be handled as a
     recoverable one-level reentry. The unrecoverable case remains a hit
     while already in KPROBE_REENTER.

With both patches applied, we have kept the same stress test running for
three days and the crash is no longer reproduced.

I still have the full dmesg and fulldump from the crash device. Please
let me know if any additional information would be useful.

Thanks,
hupu

Reply via email to