On Wed, 2025-10-15 at 22:14 +0800, Adrian Huang (Lenovo) wrote: > When executing the following command (reproducible 100%), a kernel oops > occurs: >
Thanks Adrian for the patch, I believe your fix is equivalent to Nam's [1], we'll try to get [1] merged for -rc2. Thanks, Gabriele [1] - https://lore.kernel.org/lkml/[email protected] > # cat /sys/kernel/debug/tracing/rv/enabled_monitors > > [dmesg] > BUG: unable to handle page fault for address: ffffffffffffffd0 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 3ddf828067 P4D 3ddf829067 PUD 3ddf82b067 PMD 0 > Oops: Oops: 0000 [#1] SMP NOPTI > CPU: 479 UID: 0 PID: 15237 Comm: cat Kdump: loaded Not tainted 6.18.0-rc1 > #25 PREEMPT(voluntary) > Hardware name: Lenovo ThinkSystem SR645 V3 MB,Genoa,DDR5,Oahu,1U/SB27B31174, > BIOS KAE139B-5.70 06/11/2025 > RIP: 0010:enabled_monitors_next+0x41/0x60 > Code: c0 66 a6 bb 75 24 eb 2d 66 66 2e 0f 1f 84 00 00 00 00 00 66 0f 1f 44 > 00 00 48 8b 50 40 48 8d 42 c0 48 81 fa c0 66 a6 bb 74 0b <80> 78 10 00 74 e9 > c3 cc cc cc cc 31 c0 c3 cc cc cc cc 66 66 2e 0f > RSP: 0018:ff565a8b1e653d38 EFLAGS: 00010203 > RAX: ffffffffffffffc0 RBX: ff41e057260eabb8 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: ffffffffbba66640 RDI: ff41e057260eabb8 > RBP: 0000000000000000 R08: ffffffffbb313406 R09: 0000000000000034 > R10: 0000000000000000 R11: 0000000000000004 R12: ff565a8b1e653dd0 > R13: ff565a8b1e653da8 R14: ffffffffbba66640 R15: 0000000000000000 > FS: 00007f8da78a4740(0000) GS:ff41e076cde6e000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffffffffffffd0 CR3: 00000001b6b26005 CR4: 0000000000f71ef0 > PKRU: 55555554 > Call Trace: > <TASK> > seq_read_iter+0x2ed/0x480 > seq_read+0x12e/0x160 > vfs_read+0xc1/0x370 > ? count_memcg_events+0xb6/0x170 > ksys_read+0x6b/0xe0 > do_syscall_64+0x89/0x810 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > RIP: 0033:0x7f8da79a0321 > > [Root Cause] > enabled_monitors_start() calls enabled_monitors_next() and passes the > address of a struct rv_monitor instead of the address of its embedded > list_head. > > Commit de090d1ccae1 ("rv: Fix wrong type cast in enabled_monitors_next()") > assumes that the argument p refers to a list_head (the embedded list > anchor). This leads to the miscalculated address for the corresponding > struct rv_monitor object. > > Here’s what happens in detail: > > 1. Address of rv_monitors_list = 0xffffffffbba666c0 > crash> p &rv_monitors_list > $1 = (struct list_head *) 0xffffffffbba666c0 <rv_monitors_list> > > 2. offset of rv_monitor.list = 0x40 > crash> rv_monitor.list -x > struct rv_monitor { > [0x40] struct list_head list; > } > > 3. In enabled_monitors_start(): The local variable mon is assigned > using list_entry(), resulting in: > mon = &rv_monitors_list - 0x40 = 0xffffffffbba66680 > > 4. In enabled_monitors_next(): The argument p becomes > &rv_monitors_list - 0x40. However, container_of() subtracts the > offset of rv_monitor.list again, making mon equal to > &rv_monitors_list - 0x80. This double subtraction results in an > invalid address and triggers the page fault. > > Fix the issue by returning the address of the list_head from both > enabled_monitors_start() and enabled_monitors_next() instead of the > address of struct rv_monitor. > > The following verifications make sure the issue is fixed: > 1. Without enabling any monitors > # cat /sys/kernel/debug/tracing/rv/enabled_monitors > <No output; no kernel oops.> > > 2. Enable monitor and reading enabled_monitors: Test #1 > # echo 1 > /sys/kernel/debug/tracing/rv/monitors/wwnr/enable > # echo 1 > /sys/kernel/debug/tracing/rv/monitors/rtapp/sleep/enable > # cat /sys/kernel/debug/tracing/rv/enabled_monitors > wwnr > rtapp:sleep > > 3. Enable monitor and reading enabled_monitors: Test #2 > # echo 0 > /sys/kernel/debug/tracing/rv/monitors/wwnr/enable > # echo 1 > /sys/kernel/debug/tracing/rv/monitors/rtapp/sleep/enable > # echo 1 > /sys/kernel/debug/tracing/rv/monitors/rtapp/pagefault/enable > # echo 1 > /sys/kernel/debug/tracing/rv/monitors/rtapp/enable > # cat /sys/kernel/debug/tracing/rv/enabled_monitors > rtapp > rtapp:sleep > rtapp:pagefault > > Fixes: de090d1ccae1 ("rv: Fix wrong type cast in enabled_monitors_next()") > Signed-off-by: Adrian Huang (Lenovo) <[email protected]> > --- > kernel/trace/rv/rv.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/kernel/trace/rv/rv.c b/kernel/trace/rv/rv.c > index 48338520376f..021cc9bc57ab 100644 > --- a/kernel/trace/rv/rv.c > +++ b/kernel/trace/rv/rv.c > @@ -501,7 +501,7 @@ static void *enabled_monitors_next(struct seq_file *m, > void *p, loff_t *pos) > > list_for_each_entry_continue(mon, &rv_monitors_list, list) { > if (mon->enabled) > - return mon; > + return &mon->list; > } > > return NULL; > @@ -509,23 +509,21 @@ static void *enabled_monitors_next(struct seq_file *m, > void *p, loff_t *pos) > > static void *enabled_monitors_start(struct seq_file *m, loff_t *pos) > { > - struct rv_monitor *mon; > + struct list_head *lh = &rv_monitors_list; > loff_t l; > > mutex_lock(&rv_interface_lock); > > - if (list_empty(&rv_monitors_list)) > + if (list_empty(lh)) > return NULL; > > - mon = list_entry(&rv_monitors_list, struct rv_monitor, list); > - > for (l = 0; l <= *pos; ) { > - mon = enabled_monitors_next(m, mon, &l); > - if (!mon) > + lh = enabled_monitors_next(m, lh, &l); > + if (!lh) > break; > } > > - return mon; > + return lh; > } > > /*
