Hi lianbo, On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <[email protected]> wrote: > > Recently we have observed some failures as below: > > crash> set 2276866 > set: invalid kernel virtual address: 0 type: "stack contents" > set: read of stack at 0 failed > > crash> ps 2276866 > PID PPID CPU TASK ST %MEM VSZ RSS COMM > 2276866 2276750 47 ff3a19fbd3c80000 ZO 0.0 0 0 sh > > This is a regression issue that introduced by adding gdb stack unwind > support. When attempting to read from the stack, firstly, need to check > if the stack exists, otherwise it may fail in some corner cases. E.g: > there are some zombie processes(ZO) and the stack does not exist. > Furthermore this may also break the switching thread in gdb. > > With the patch: > crash> set 2276866 > PID: 2276866 > COMMAND: "sh" > TASK: ff3a19fbd3c80000 [THREAD_INFO: ff3a19fbd3c80000] > CPU: 47 > STATE: EXIT_DEAD|EXIT_ZOMBIE > > Reported-by: Buland Kumar Singh <[email protected]> > Signed-off-by: Lianbo Jiang <[email protected]> > --- > arm64.c | 2 ++ > ppc64.c | 2 ++ > x86_64.c | 2 ++ > 3 files changed, 6 insertions(+) > > diff --git a/arm64.c b/arm64.c > index 354d17ab6a19..17235950bb60 100644 > --- a/arm64.c > +++ b/arm64.c > @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char *name, > > BZERO(&bt_setup, sizeof(struct bt_info)); > clone_bt_info(&bt_setup, &bt_info, tc); > + if (bt_info.stackbase == 0) > + return FALSE; > fill_stackbuf(&bt_info); > > get_dumpfile_regs(&bt_info, &sp, &ip); > diff --git a/ppc64.c b/ppc64.c > index d1a506773c93..9c5c0a460c7a 100644 > --- a/ppc64.c > +++ b/ppc64.c > @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const char *name, > int size, > > BZERO(&bt_setup, sizeof(struct bt_info)); > clone_bt_info(&bt_setup, &bt_info, tc); > + if (bt_info.stackbase == 0) > + return FALSE; > fill_stackbuf(&bt_info); > > // reusing the get_dumpfile_regs function to get pt regs structure > diff --git a/x86_64.c b/x86_64.c > index d7da536d20d8..b2cddbf8ba3d 100644 > --- a/x86_64.c > +++ b/x86_64.c > @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const char *name, > > BZERO(&bt_setup, sizeof(struct bt_info)); > clone_bt_info(&bt_setup, &bt_info, tc); > + if (bt_info.stackbase == 0) > + return FALSE;
The fix makes sense to me, however, exit directly will make the register cache unrefreshed. That is, with the return "FALSE", "set 2276866" will succeed in task switching, but the register cache is still the old one, so "gdb bt" still outputs the previous stackstrace which is not 2276866's stack. I suggest adding a warning telling users that gdb related commands such as 'bt', 'frame', 'up', 'down', 'info locals' are not workable, like: Warning: registers unable to refresh, the outputs of the following gdb related commands are not reliable: 'bt', 'frame', 'up', 'down', 'info locals'. What do you think? Thanks, Tao Liu > fill_stackbuf(&bt_info); > > // reusing the get_dumpfile_regs function to get pt regs structure > -- > 2.50.1 > -- > Crash-utility mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ > Contribution Guidelines: https://github.com/crash-utility/crash/wiki -- Crash-utility mailing list -- [email protected] To unsubscribe send an email to [email protected] https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki
