applied: 
https://github.com/crash-utility/crash/commit/3d57b4b393558dd4b22312eef251af11bf6530f5

On Mon, Nov 3, 2025 at 8:45 PM lijiang <[email protected]> wrote:
>
> On Mon, Nov 3, 2025 at 11:57 AM Tao Liu <[email protected]> wrote:
>>
>> On Mon, Nov 3, 2025 at 2:58 PM lijiang <[email protected]> wrote:
>> >
>> > On Fri, Oct 31, 2025 at 5:31 AM Tao Liu <[email protected]> wrote:
>> >>
>> >> Hi lianbo,
>> >>
>> >> On Tue, Oct 28, 2025 at 9:57 PM Lianbo Jiang <[email protected]> wrote:
>> >> >
>> >> > Recently we have observed some failures as below:
>> >> >
>> >> >   crash> set 2276866
>> >> >   set: invalid kernel virtual address: 0  type: "stack contents"
>> >> >   set: read of stack at 0 failed
>> >> >
>> >> >   crash> ps 2276866
>> >> >         PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  
>> >> > COMM
>> >> >     2276866 2276750  47  ff3a19fbd3c80000  ZO   0.0        0        0  
>> >> > sh
>> >> >
>> >> > This is a regression issue that introduced by adding gdb stack unwind
>> >> > support. When attempting to read from the stack, firstly, need to check
>> >> > if the stack exists, otherwise it may fail in some corner cases. E.g:
>> >> > there are some zombie processes(ZO) and the stack does not exist.
>> >> > Furthermore this may also break the switching thread in gdb.
>> >> >
>> >> > With the patch:
>> >> >   crash> set 2276866
>> >> >       PID: 2276866
>> >> >   COMMAND: "sh"
>> >> >      TASK: ff3a19fbd3c80000  [THREAD_INFO: ff3a19fbd3c80000]
>> >> >       CPU: 47
>> >> >     STATE: EXIT_DEAD|EXIT_ZOMBIE
>> >> >
>> >> > Reported-by: Buland Kumar Singh <[email protected]>
>> >> > Signed-off-by: Lianbo Jiang <[email protected]>
>> >> > ---
>> >> >  arm64.c  | 2 ++
>> >> >  ppc64.c  | 2 ++
>> >> >  x86_64.c | 2 ++
>> >> >  3 files changed, 6 insertions(+)
>> >> >
>> >> > diff --git a/arm64.c b/arm64.c
>> >> > index 354d17ab6a19..17235950bb60 100644
>> >> > --- a/arm64.c
>> >> > +++ b/arm64.c
>> >> > @@ -234,6 +234,8 @@ arm64_get_current_task_reg(int regno, const char 
>> >> > *name,
>> >> >
>> >> >         BZERO(&bt_setup, sizeof(struct bt_info));
>> >> >         clone_bt_info(&bt_setup, &bt_info, tc);
>> >> > +       if (bt_info.stackbase == 0)
>> >> > +               return FALSE;
>> >> >         fill_stackbuf(&bt_info);
>> >> >
>> >> >         get_dumpfile_regs(&bt_info, &sp, &ip);
>> >> > diff --git a/ppc64.c b/ppc64.c
>> >> > index d1a506773c93..9c5c0a460c7a 100644
>> >> > --- a/ppc64.c
>> >> > +++ b/ppc64.c
>> >> > @@ -2606,6 +2606,8 @@ ppc64_get_current_task_reg(int regno, const char 
>> >> > *name, int size,
>> >> >
>> >> >         BZERO(&bt_setup, sizeof(struct bt_info));
>> >> >         clone_bt_info(&bt_setup, &bt_info, tc);
>> >> > +       if (bt_info.stackbase == 0)
>> >> > +               return FALSE;
>> >> >         fill_stackbuf(&bt_info);
>> >> >
>> >> >         // reusing the get_dumpfile_regs function to get pt regs 
>> >> > structure
>> >> > diff --git a/x86_64.c b/x86_64.c
>> >> > index d7da536d20d8..b2cddbf8ba3d 100644
>> >> > --- a/x86_64.c
>> >> > +++ b/x86_64.c
>> >> > @@ -9383,6 +9383,8 @@ x86_64_get_current_task_reg(int regno, const char 
>> >> > *name,
>> >> >
>> >> >         BZERO(&bt_setup, sizeof(struct bt_info));
>> >> >         clone_bt_info(&bt_setup, &bt_info, tc);
>> >> > +       if (bt_info.stackbase == 0)
>> >> > +               return FALSE;
>> >>
>> >> The fix makes sense to me, however, exit directly will make the
>> >> register cache unrefreshed. That is, with the return "FALSE", "set
>> >> 2276866" will succeed in task switching, but the register cache is
>> >> still the old one, so "gdb bt" still outputs the previous stackstrace
>> >> which is not 2276866's stack. I suggest adding a warning telling users
>> >
>> >
>> > Actually, I haven't seen the case you mentioned, and it works as expected:
>> >
>> > Without the patch:
>> > crash> set 2276866
>> > set: invalid kernel virtual address: 0  type: "stack contents"
>> > set: read of stack at 0 failed
>> >
>> > crash> bt
>> > PID: 2276866  TASK: ff3a19fbd3c80000  CPU: 47   COMMAND: "sh"
>> > (no stack)
>> >
>> > crash> gdb bt
>> > #0  crash_setup_regs (oldregs=0x0, newregs=0xff43e468633c7d38) at 
>> > ./arch/x86/include/asm/processor.h:58
>> > #1  __crash_kexec (regs=regs@entry=0x0) at kernel/kexec_core.c:952
>> > #2  0xffffffff86cf976f in panic (fmt=fmt@entry=0xffffffff87f69f99 "sysrq 
>> > triggered crash\n") at kernel/panic.c:230
>> > #3  0xffffffff87210201 in sysrq_handle_crash (key=<optimized out>) at 
>> > drivers/tty/sysrq.c:142
>> > #4  0xffffffff87210b24 in __handle_sysrq (key=99, check_mask=<optimized 
>> > out>) at drivers/tty/sysrq.c:559
>> > #5  0xffffffff872109cb in write_sysrq_trigger (file=<optimized out>, 
>> > buf=<optimized out>, count=2, ppos=<optimized out>) at 
>> > drivers/tty/sysrq.c:1106
>> > #6  0xffffffff86ff5fc9 in proc_reg_write (file=<optimized out>, 
>> > buf=<optimized out>, count=<optimized out>, ppos=<optimized out>) at 
>> > fs/proc/inode.c:241
>> > #7  0xffffffff86f6e845 in vfs_write (pos=0xff43e468633c7f08, count=2, 
>> > buf=0x7ffc5b412780 <error: Cannot access memory at address 
>> > 0x7ffc5b412780>, file=0xff3a19e92ee37b00) at fs/read_write.c:549
>> > #8  vfs_write (file=0xff3a19e92ee37b00, buf=0x7ffc5b412780 <error: Cannot 
>> > access memory at address 0x7ffc5b412780>, count=<optimized out>, 
>> > pos=0xff43e468633c7f08) at fs/read_write.c:533
>> > #9  0xffffffff86f6eacf in ksys_write (fd=<optimized out>, 
>> > buf=0x7ffc5b412780 <error: Cannot access memory at address 
>> > 0x7ffc5b412780>, count=2) at fs/read_write.c:598
>> > #10 0xffffffff86c03cab in do_syscall_64 (nr=1, regs=0xff43e468633c7f58) at 
>> > arch/x86/entry/common.c:303
>> > #11 0xffffffff8780012e in entry_SYSCALL_64 () at 
>> > arch/x86/entry/entry_64.S:147
>> > crash>
>> >
>> > The above case breaks the switching thread in gdb, just like the patch log 
>> > I mentioned.
>> >
>> > With the patch:
>> > crash> set 2276866
>> >     PID: 2276866
>> > COMMAND: "sh"
>> >    TASK: ff3a19fbd3c80000  [THREAD_INFO: ff3a19fbd3c80000]
>> >     CPU: 47
>> >   STATE: EXIT_DEAD|EXIT_ZOMBIE
>> >
>> > crash> bt
>> > PID: 2276866  TASK: ff3a19fbd3c80000  CPU: 47   COMMAND: "sh"
>> > (no stack)
>> >
>> > crash> gdb bt
>> > crash>
>> >
>> > That is expected behavior, and I did not see the case that you pointed out.
>> >
>> >
>> >> that gdb related commands such as 'bt', 'frame', 'up', 'down', 'info
>> >> locals' are not workable, like:
>> >
>> >
>> > Have you reproduced  the case that the register cache is unrefreshed?
>>
>> Right, I re-test the patch and it work as expected, sorry for the
>> confusion. For the patch, ack.
>>
>
> No worries. Thanks for the review, Tao.
>
> Lianbo
>
>> Thanks,
>> Tao Liu
>>
>> >
>> > Thanks
>> > Lianbo
>> >
>> >>
>> >> Warning: registers unable to refresh, the outputs of the following gdb
>> >> related commands are not reliable: 'bt', 'frame', 'up', 'down', 'info
>> >> locals'.
>> >>
>> >> What do you think?
>> >>
>> >> Thanks,
>> >> Tao Liu
>> >>
>> >>
>> >>
>> >> >         fill_stackbuf(&bt_info);
>> >> >
>> >> >         // reusing the get_dumpfile_regs function to get pt regs 
>> >> > structure
>> >> > --
>> >> > 2.50.1
>>
--
Crash-utility mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

Reply via email to