applied: https://github.com/crash-utility/crash/commit/3d57b4b393558dd4b22312eef251af11bf6530f5
On Mon, Nov 3, 2025 at 8:45 PM lijiang <[email protected]> wrote: > > On Mon, Nov 3, 2025 at 11:57 AM Tao Liu <[email protected]> wrote: >> >> On Mon, Nov 3, 2025 at 2:58 PM lijiang <[email protected]> wrote: >> > >> > On Fri, Oct 31, 2025 at 5:31 AM Tao Liu <[email protected]> wrote: >> >> >> >> Hi lianbo, >> >> >> >> On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <[email protected]> wrote: >> >> > >> >> > Recently we have observed some failures as below: >> >> > >> >> > crash> set 2276866 >> >> > set: invalid kernel virtual address: 0 type: "stack contents" >> >> > set: read of stack at 0 failed >> >> > >> >> > crash> ps 2276866 >> >> > PID PPID CPU TASK ST %MEM VSZ RSS >> >> > COMM >> >> > 2276866 2276750 47 ff3a19fbd3c80000 ZO 0.0 0 0 >> >> > sh >> >> > >> >> > This is a regression issue that introduced by adding gdb stack unwind >> >> > support. When attempting to read from the stack, firstly, need to check >> >> > if the stack exists, otherwise it may fail in some corner cases. E.g: >> >> > there are some zombie processes(ZO) and the stack does not exist. >> >> > Furthermore this may also break the switching thread in gdb. >> >> > >> >> > With the patch: >> >> > crash> set 2276866 >> >> > PID: 2276866 >> >> > COMMAND: "sh" >> >> > TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000] >> >> > CPU: 47 >> >> > STATE: EXIT_DEAD|EXIT_ZOMBIE >> >> > >> >> > Reported-by: Buland Kumar Singh <[email protected]> >> >> > Signed-off-by: Lianbo Jiang <[email protected]> >> >> > --- >> >> > arm64.c | 2 ++ >> >> > ppc64.c | 2 ++ >> >> > x86_64.c | 2 ++ >> >> > 3 files changed, 6 insertions(+) >> >> > >> >> > diff --git a/arm64.c b/arm64.c >> >> > index 354d17ab6a19..17235950bb60 100644 >> >> > --- a/arm64.c >> >> > +++ b/arm64.c >> >> > @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char >> >> > *name, >> >> > >> >> > BZERO(&bt_setup, sizeof(struct bt_info)); >> >> > clone_bt_info(&bt_setup, &bt_info, tc); >> >> > + if (bt_info.stackbase == 0) >> >> > + return FALSE; >> >> > fill_stackbuf(&bt_info); >> >> > >> >> > get_dumpfile_regs(&bt_info, &sp, &ip); >> >> > diff --git a/ppc64.c b/ppc64.c >> >> > index d1a506773c93..9c5c0a460c7a 100644 >> >> > --- a/ppc64.c >> >> > +++ b/ppc64.c >> >> > @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const char >> >> > *name, int size, >> >> > >> >> > BZERO(&bt_setup, sizeof(struct bt_info)); >> >> > clone_bt_info(&bt_setup, &bt_info, tc); >> >> > + if (bt_info.stackbase == 0) >> >> > + return FALSE; >> >> > fill_stackbuf(&bt_info); >> >> > >> >> > // reusing the get_dumpfile_regs function to get pt regs >> >> > structure >> >> > diff --git a/x86_64.c b/x86_64.c >> >> > index d7da536d20d8..b2cddbf8ba3d 100644 >> >> > --- a/x86_64.c >> >> > +++ b/x86_64.c >> >> > @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const char >> >> > *name, >> >> > >> >> > BZERO(&bt_setup, sizeof(struct bt_info)); >> >> > clone_bt_info(&bt_setup, &bt_info, tc); >> >> > + if (bt_info.stackbase == 0) >> >> > + return FALSE; >> >> >> >> The fix makes sense to me, however, exit directly will make the >> >> register cache unrefreshed. That is, with the return "FALSE", "set >> >> 2276866" will succeed in task switching, but the register cache is >> >> still the old one, so "gdb bt" still outputs the previous stackstrace >> >> which is not 2276866's stack. I suggest adding a warning telling users >> > >> > >> > Actually, I haven't seen the case you mentioned, and it works as expected: >> > >> > Without the patch: >> > crash> set 2276866 >> > set: invalid kernel virtual address: 0 type: "stack contents" >> > set: read of stack at 0 failed >> > >> > crash> bt >> > PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh" >> > (no stack) >> > >> > crash> gdb bt >> > #0 crash_setup_regs (oldregs=0x0, newregs=0xff43e468633c7d38) at >> > ./arch/x86/include/asm/processor.h:58 >> > #1 __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:952 >> > #2 0xffffffff86cf976f in panic (fmt=fmt@entry=0xffffffff87f69f99 "sysrq >> > triggered crash\n") at kernel/panic.c:230 >> > #3 0xffffffff87210201 in sysrq_handle_crash (key=<optimized out>) at >> > drivers/tty/sysrq.c:142 >> > #4 0xffffffff87210b24 in __handle_sysrq (key=99, check_mask=<optimized >> > out>) at drivers/tty/sysrq.c:559 >> > #5 0xffffffff872109cb in write_sysrq_trigger (file=<optimized out>, >> > buf=<optimized out>, count=2, ppos=<optimized out>) at >> > drivers/tty/sysrq.c:1106 >> > #6 0xffffffff86ff5fc9 in proc_reg_write (file=<optimized out>, >> > buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at >> > fs/proc/inode.c:241 >> > #7 0xffffffff86f6e845 in vfs_write (pos=0xff43e468633c7f08, count=2, >> > buf=0x7ffc5b412780 <error: Cannot access memory at address >> > 0x7ffc5b412780>, file=0xff3a19e92ee37b00) at fs/read_write.c:549 >> > #8 vfs_write (file=0xff3a19e92ee37b00, buf=0x7ffc5b412780 <error: Cannot >> > access memory at address 0x7ffc5b412780>, count=<optimized out>, >> > pos=0xff43e468633c7f08) at fs/read_write.c:533 >> > #9 0xffffffff86f6eacf in ksys_write (fd=<optimized out>, >> > buf=0x7ffc5b412780 <error: Cannot access memory at address >> > 0x7ffc5b412780>, count=2) at fs/read_write.c:598 >> > #10 0xffffffff86c03cab in do_syscall_64 (nr=1, regs=0xff43e468633c7f58) at >> > arch/x86/entry/common.c:303 >> > #11 0xffffffff8780012e in entry_SYSCALL_64 () at >> > arch/x86/entry/entry_64.S:147 >> > crash> >> > >> > The above case breaks the switching thread in gdb, just like the patch log >> > I mentioned. >> > >> > With the patch: >> > crash> set 2276866 >> > PID: 2276866 >> > COMMAND: "sh" >> > TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000] >> > CPU: 47 >> > STATE: EXIT_DEAD|EXIT_ZOMBIE >> > >> > crash> bt >> > PID: 2276866 TASK: ff3a19fbd3c80000 CPU: 47 COMMAND: "sh" >> > (no stack) >> > >> > crash> gdb bt >> > crash> >> > >> > That is expected behavior, and I did not see the case that you pointed out. >> > >> > >> >> that gdb related commands such as 'bt', 'frame', 'up', 'down', 'info >> >> locals' are not workable, like: >> > >> > >> > Have you reproduced the case that the register cache is unrefreshed? >> >> Right, I re-test the patch and it work as expected, sorry for the >> confusion. For the patch, ack. >> > > No worries. Thanks for the review, Tao. > > Lianbo > >> Thanks, >> Tao Liu >> >> > >> > Thanks >> > Lianbo >> > >> >> >> >> Warning: registers unable to refresh, the outputs of the following gdb >> >> related commands are not reliable: 'bt', 'frame', 'up', 'down', 'info >> >> locals'. >> >> >> >> What do you think? >> >> >> >> Thanks, >> >> Tao Liu >> >> >> >> >> >> >> >> > fill_stackbuf(&bt_info); >> >> > >> >> > // reusing the get_dumpfile_regs function to get pt regs >> >> > structure >> >> > -- >> >> > 2.50.1 >> -- Crash-utility mailing list -- [email protected] To unsubscribe send an email to [email protected] https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki
