[Crash-utility] Re: [PATCH v7 00/15] gdb stack unwinding support for crash utility

Tao Liu Thu, 12 Sep 2024 23:56:14 -0700

Hi lianbo,

On Wed, Sep 11, 2024 at 7:32 PM lijiang <[email protected]> wrote:
>
> On Wed, Sep 11, 2024 at 2:36 PM Tao Liu <[email protected]> wrote:
>>
>> Hi lianbo,
>>
>> On Wed, Sep 11, 2024 at 2:26 PM lijiang <[email protected]> wrote:
>> >
>> > Hi, Tao
>> >
>> > Thank you for the update.
>> >
>> > The following patch is a regression issue, so I tend to discuss it as a 
>> > separate patch.
>> > [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper
>
>
> Can you also post v2 for this one? I have two comments about it:
> [1] is it possible to not introduce the code related to hyper to a common 
> module such as tools.c?
> [2] for IA64 arch, I saw the machdep->get_irq_affinity = 
> generic_get_irq_affinity is registered (see the ia64_init())
>
>
>> >
>> > In addition, I found another issue in my tests(on ppc64le), the gdb bt can 
>> > display the back trace for the panic task, but when I switch to another 
>> > task, the gdb bt can not display the back trace:
>> >
>> > crash> gdb bt
>> > #0  0xc0000000002bde04 in crash_setup_regs (newregs=0xc00000003264b858, 
>> > oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133
>> > #1  0xc0000000002be4f8 in __crash_kexec (regs=0x0) at 
>> > kernel/crash_core.c:122
>> > #2  0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq triggered 
>> > crash\n") at kernel/panic.c:373
>> > #3  0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>) at 
>> > drivers/tty/sysrq.c:154
>> > #4  0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c', 
>> > check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
>> > #5  0xc000000000a71e94 in write_sysrq_trigger (file=<optimized out>, 
>> > buf=<optimized out>, count=2, ppos=<optimized out>) at 
>> > drivers/tty/sysrq.c:1181
>> > #6  0xc00000000073260c in pde_write (pde=0xc00000000af9cc00, 
>> > file=<optimized out>, buf=<optimized out>, count=<optimized out>, 
>> > ppos=<optimized out>) at fs/proc/inode.c:334
>> > #7  proc_reg_write (file=<optimized out>, buf=<optimized out>, 
>> > count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
>> > #8  0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900, 
>> > buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, 
>> > count=2, pos=0xc00000003264bd30) at fs/read_write.c:588
>> > #9  vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error: Cannot 
>> > access memory at address 0x10012536f60>, count=<optimized out>, 
>> > pos=0xc00000003264bd30) at fs/read_write.c:570
>> > #10 0xc00000000063c690 in ksys_write (fd=<optimized out>, 
>> > buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, 
>> > count=2) at fs/read_write.c:643
>> > #11 0xc000000000031a28 in system_call_exception (regs=0xc00000003264be80, 
>> > r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153
>> > #12 0xc00000000000d05c in system_call_vectored_common () at 
>> > arch/powerpc/kernel/interrupt_64.S:198
>> >
>> > crash> ps
>> >       PID    PPID  CPU       TASK        ST  %MEM      VSZ      RSS  COMM
>> >         0       0   0  c000000002bda980  RU   0.0        0        0  
>> > [swapper/0]
>> > >       0       0   1  c000000003864c80  RU   0.0        0        0  
>> > > [swapper/1]
>> > ...
>> >      8017     923   0  c000000043a20000  IN   0.2    22528    16256  
>> > sshd-session
>> >      8025    8017   6  c000000032271880  IN   0.1    22784    11840  
>> > sshd-session
>> > >    8026    8025   0  c000000043a26600  RU   0.1     9664     6208  bash
>> > ...
>> >     11645       2   3  c000000032264c80  ID   0.0        0        0  
>> > [kworker/u32:2]
>> >     11738    6188   2  c00000003811b180  IN   0.1    43520     9408  pickup
>> >     12326       2   0  c00000003226b280  ID   0.0        0        0  
>> > [kworker/0:1]
>> >     13112    6089   2  c00000000c809900  IN   0.0     7232     3456  sleep
>> >
>> > Let's take the "pickup" task as an example:
>> >
>> > crash> set 11738
>> >     PID: 11738
>> > COMMAND: "pickup"
>> >    TASK: c00000003811b180  [THREAD_INFO: c00000003811b180]
>> >     CPU: 2
>> >   STATE: TASK_INTERRUPTIBLE
>> >
>> > crash> gdb bt
>> > #0  0xc0000000a7f876a0 in ?? ()
>> > gdb: gdb request failed: bt
>> > crash> set gdb on
>> > gdb: on
>> > gdb> bt
>> > #0  0xc0000000a7f876a0 in ?? ()
>> > gdb>
>>
>> There is a bug for ppc64 crash of newer version kernel. The code for
>> determining the address of pt_regs from stack is outdated, see the
>> following code from crash:
>>
>> ppc64.c:get_ppc64_frame()
>> readmem(sp+STACK_FRAME_OVERHEAD, KVADDR, &regs, sizeof(struct
>> ppc64_pt_regs), "PPC64 pt_regs", FAULT_ON_ERROR);
>>
>> The pt_regs is expected to be placed at sp+STACK_FRAME_OVERHEAD, aka sp+112.
>>
>> However since kernel >= v6.2, the value is no longer appropriate:
>>
>> linux kernel:arch/powerpc/kernel/process.c:copy_thread():
>> kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS);
>> p->thread.ksp = sp;
>>
>> linux kernel:arch/powerpc/include/asm/ptrace.h:
>> #ifdef CONFIG_PPC64_ELF_ABI_V2
>> #define STACK_FRAME_MIN_SIZE 32
>> STACK_SWITCH_FRAME_REGS (STACK_FRAME_MIN_SIZE + 16)
>>
>
> Good findings, Tao.


The patch to fix this is posted:
https://www.mail-archive.com/[email protected]/msg01124.html

>
>>
>> If we apply the change to crash, i.e:
>> readmem(sp+0x30, KVADDR, &regs, sizeof(struct ppc64_pt_regs), "PPC64
>> pt_regs", FAULT_ON_ERROR);
>>
>> The stack unwinding can work as expected, you can have a test locally
>> to see if the above change works for you.
>>
>> So this bug isn't related to the gdb stack unwinding support to me,
>> just a bug relating to a newer version of kernel.
>>
>
> Agree.  For the [PATCH V7 02/15] -[PATCH V7 15/15]:  Ack.
>
> And I will put them in the merging queue, once the current issue gets 
> resolved, we can merge them together. Otherwise it may not work on ppc64 arch.

OK, agreed.

Thanks,
Tao Liu
>
>> I think we can post an individual patch to deal with this issue. Since
>> there are plenty of places in crash which use the old
>> STACK_FRAME_OVERHEAD value, maybe they all need to be updated.
>>
>
> Please go ahead.
>
> Thanks
> Lianbo
>
>>
>> Thanks,
>> Tao Liu
>> >
>> > Anyway, I did the same test on x86 64 and aarch64, it can work well as 
>> > expected. Can you help to double check on ppc64 architecture?
>> >
>> > X86 64:
>> > crash> set 14599
>> >     PID: 14599
>> > COMMAND: "pickup"
>> >    TASK: ffff8f57a0d7c180  [THREAD_INFO: ffff8f57a0d7c180]
>> >     CPU: 41
>> >   STATE: TASK_INTERRUPTIBLE
>> > crash> gdb bt
>> > #0  0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900, 
>> > prev=0xffff8f57a0d7c180, next=0xffff8f5786720000, rf=0xffff9df22fea7b80) 
>> > at kernel/sched/core.c:5208
>> > #1  __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
>> > #2  0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized out>) at 
>> > kernel/sched/core.c:6626
>> > #3  schedule () at kernel/sched/core.c:6641
>> > #4  0xffffffff8b3f6eef in schedule_hrtimeout_range_clock 
>> > (expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>, 
>> > delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS, 
>> > clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
>> > #5  0xffffffff8b3f7003 in schedule_hrtimeout_range 
>> > (expires=expires@entry=0xffff9df22fea7cb0, delta=delta@entry=99999999, 
>> > mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
>> > #6  0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40, 
>> > events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, 
>> > timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062
>> > #7  0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8, 
>> > events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, 
>> > to=0xffff9df22fea7d58) at fs/eventpoll.c:2464
>> > #8  0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized out>, 
>> > events=0x7ffea91b6b90, maxevents=<optimized out>, timeout=<optimized out>) 
>> > at fs/eventpoll.c:2476
>> > #9  __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized out>, 
>> > maxevents=<optimized out>, timeout=<optimized out>) at fs/eventpoll.c:2471
>> > #10 __x64_sys_epoll_wait (regs=<optimized out>) at fs/eventpoll.c:2471
>> > #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48, nr=232) 
>> > at arch/x86/entry/common.c:52
>> > #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at 
>> > arch/x86/entry/common.c:83
>> > #13 0xffffffff8b40012f in entry_SYSCALL_64 () at 
>> > arch/x86/entry/entry_64.S:121
>> > crash>
>> >
>> >
>> > aarch64:
>> > crash> set 9338
>> >     PID: 9338
>> > COMMAND: "pickup"
>> >    TASK: ffff0000c7b05400  [THREAD_INFO: ffff0000c7b05400]
>> >     CPU: 3
>> >   STATE: TASK_INTERRUPTIBLE
>> > crash> gdb bt
>> > #0  __switch_to (prev=<unavailable>, prev@entry=0xffff0000c7b05400, 
>> > next=next@entry=<unavailable>) at arch/arm64/kernel/process.c:555
>> > #1  0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0, 
>> > prev=0xffff0000c7b05400, next=<unavailable>, rf=0xffff80008ac63a60) at 
>> > kernel/sched/core.c:5208
>> > #2  __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549
>> > #3  0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized out>) at 
>> > kernel/sched/core.c:6626
>> > #4  schedule () at kernel/sched/core.c:6641
>> > #5  0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock 
>> > (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, 
>> > mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at 
>> > kernel/time/hrtimer.c:2293
>> > #6  0xffffafc5b5ec618c in schedule_hrtimeout_range 
>> > (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, 
>> > mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340
>> > #7  0xffffafc5b545d33c in ep_poll (ep=<unavailable>, 
>> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, 
>> > timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062
>> > #8  0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8, 
>> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, 
>> > to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464
>> > #9  0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8, 
>> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, 
>> > to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>, 
>> > sigmask=<optimized out>) at fs/eventpoll.c:2498
>> > #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8, events=0xffffde5c3f68, 
>> > maxevents=100, to=0xffff80008ac63ce0, sigmask=<optimized out>, 
>> > sigsetsize=<optimized out>) at fs/eventpoll.c:2495
>> > #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68, maxevents=100, 
>> > timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized 
>> > out>) at fs/eventpoll.c:2511
>> > #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856, maxevents=100, 
>> > timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized 
>> > out>) at fs/eventpoll.c:2505
>> > #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at fs/eventpoll.c:2505
>> > #14 0xffffafc5b4fa99bc in __invoke_syscall (regs=0xffff80008ac63eb0, 
>> > syscall_fn=<optimized out>) at arch/arm64/kernel/syscall.c:35
>> > #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0, scno=<optimized 
>> > out>, sc_nr=sc_nr@entry=463, syscall_table=<optimized out>) at 
>> > arch/arm64/kernel/syscall.c:49
>> > #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463, 
>> > syscall_table=<optimized out>, regs=0xffff80008ac63eb0, scno=<optimized 
>> > out>) at arch/arm64/kernel/syscall.c:132
>> > #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at 
>> > arch/arm64/kernel/syscall.c:151
>> > #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at 
>> > arch/arm64/kernel/entry-common.c:712
>> > #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized out>) at 
>> > arch/arm64/kernel/entry-common.c:730
>> > #20 0xffffafc5b4f91634 in el0t_64_sync () at arch/arm64/kernel/entry.S:598
>> > crash>
>> >
>> > BTW:  other changes are fine to me.
>> >
>> > Thanks
>> > Lianbo
>> >
>> > On Wed, Sep 4, 2024 at 3:54 PM <[email protected]> 
>> > wrote:
>> >>
>> >> Date: Wed,  4 Sep 2024 19:49:25 +1200
>> >> From: Tao Liu <[email protected]>
>> >> Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding support
>> >>         for crash utility
>> >> To: [email protected]
>> >> Cc: Tao Liu <[email protected]>
>> >> Message-ID: <[email protected]>
>> >> Content-Type: text/plain; charset=UTF-8
>> >>
>> >> This patchset is a rebase/merged version of the following 3 patchsets:
>> >>
>> >> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
>> >> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
>> >> 3): Clean up on top of one-thread-v2 [3]
>> >>
>> >> A complete description of gdb stack unwinding support for crash can be
>> >> found in [1].
>> >>
>> >> This patchset can be divided into the following 3 parts:
>> >>
>> >> 1) part1: preparations before stack unwinding support, some
>> >>           bugs/regressions found when drafting this patchset.
>> >> 2) part2: common part for all CPU archs, mainly dealing with
>> >>           crash_target.c/gdb_interface.c files, in order to
>> >>           support different archs.
>> >> 3) part3: arch specific, for each ppc64/x86_64/arm64/vmware
>> >>           stack unwinding support.
>> >>
>> >> === part 3
>> >> arm64: Add gdb stack unwinding support
>> >> vmware_guestdump: Various format versions support
>> >> x86_64: Add gdb stack unwinding support
>> >> ppc64: correct gdb passthroughs by implementing 
>> >> machdep->get_current_task_reg
>> >>
>> >> === part 2
>> >> Conditionally output gdb stack unwinding stop reasons
>> >> Stop stack unwinding at non-kernel address
>> >> Print task pid/command instead of CPU index
>> >> Rename get_cpu_reg to get_current_task_reg
>> >> Let crash change gdb context
>> >> Leave only one gdb thread for crash
>> >> Remove 'frame' from prohibited commands list
>> >>
>> >> === part 1
>> >> Fix gdb_interface: restore gdb's output streams at end of gdb_interface
>> >> x86_64: Fix invalid input "=>" for bt command
>> >> Fix cpumask_t recursive dependence issue
>> >> Fix the regression of cpumask_t for xen hyper
>> >> ===
>> >>
>> >> v7 -> v6:
>> >> 1) Reorganise the patchset, re-divided them into 3 part against the
>> >>    previous 2 parts.
>> >> 2) Re-dealed with the cpumask_t part, which solved the comment No.4
>> >>    pointed out by lianbo in [4].
>> >> 3) Add conditional output for the failing message of gdb stack unwinding.
>> >>    see [PATCH 11/15] Conditionally output gdb stack unwinding stop reasons
>> >> 4) Redraft the commit messages, updated some outdated info.
>> >> 5) Merged "Let crash change gdb context" and "set_context(): check if
>> >>    context is already current" into one.
>> >>
>> >> [4]: 
>> >> https://www.mail-archive.com/[email protected]/msg01067.html
>> >>
>> >> v6 -> v5:
>> >> 1) Refactor patch 4 & 9, which changed the function signature of struct
>> >>    get_cpu_reg/get_current_task_reg, and let each patch compile with no
>> >>    error when added on.
>> >> 2) Rebased the patchset on top of latest upstream:
>> >>    ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash --osrelease' 
>> >> or
>> >>    crash loading")
>> >>
>> >> v5 -> v4:
>> >> 1) Plenty of code refactoring based on Lianbo's comments on v4.
>> >> 2) Removed the magic number when dealing with regs bitmap, see [6].
>> >> 3) Rebased the patchset on top of latest upstream:
>> >>    ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on 
>> >> ramdump source")
>> >>
>> >> v4 -> v3:
>> >> Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore 
>> >> gdb's
>> >> output streams at end of gdb_interface.
>> >>
>> >> v3 -> v2:
>> >> 1) Updated CC list as pointed out in [4]
>> >> 2) Compiling issues as in [5]
>> >>
>> >> v2 -> v1:
>> >> 1) Added the patch: x86_64: Fix invalid input "=>" for bt command,
>> >>    thanks for Kazu's testing.
>> >> 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the
>> >>    pcp_save, spp_save and sp, for restoring the value in match of the 
>> >> original
>> >>    code logic.
>> >>
>> >> [1]: 
>> >> https://www.mail-archive.com/[email protected]/msg00469.html
>> >> [2]: 
>> >> https://www.mail-archive.com/[email protected]/msg00488.html
>> >> [3]: 
>> >> https://www.mail-archive.com/[email protected]/msg00554.html
>> >> [4]: 
>> >> https://www.mail-archive.com/[email protected]/msg00681.html
>> >> [5]: 
>> >> https://www.mail-archive.com/[email protected]/msg00715.html
>> >> [6]: 
>> >> https://www.mail-archive.com/[email protected]/msg00819.html
>> >>
>> >> Aditya Gupta (3):
>> >>   Fix gdb_interface: restore gdb's output streams at end of
>> >>     gdb_interface
>> >>   Remove 'frame' from prohibited commands list
>> >>   ppc64: correct gdb passthroughs by implementing
>> >>     machdep->get_current_task_reg
>> >>
>> >> Alexey Makhalov (1):
>> >>   vmware_guestdump: Various format versions support
>> >>
>> >> Tao Liu (11):
>> >>   Fix the regression of cpumask_t for xen hyper
>> >>   Fix cpumask_t recursive dependence issue
>> >>   x86_64: Fix invalid input "=>" for bt command
>> >>   Leave only one gdb thread for crash
>> >>   Let crash change gdb context
>> >>   Rename get_cpu_reg to get_current_task_reg
>> >>   Print task pid/command instead of CPU index
>> >>   Stop stack unwinding at non-kernel address
>> >>   Conditionally output gdb stack unwinding stop reasons
>> >>   x86_64: Add gdb stack unwinding support
>> >>   arm64: Add gdb stack unwinding support
>> >>
>> >>  arm64.c            | 120 +++++++++++++++--
>> >>  crash_target.c     |  71 ++++++----
>> >>  defs.h             | 194 ++++++++++++++++++++++++++-
>> >>  gdb-10.2.patch     |  96 ++++++++++++++
>> >>  gdb_interface.c    |  39 ++----
>> >>  kernel.c           |  63 +++++++--
>> >>  ppc64.c            | 174 +++++++++++++++++++++++-
>> >>  symbols.c          |  15 +++
>> >>  task.c             |  34 +++--
>> >>  tools.c            |  16 ++-
>> >>  unwind_x86_64.h    |   4 -
>> >>  vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++-------------
>> >>  x86_64.c           | 323 ++++++++++++++++++++++++++++++++++++++++-----
>> >>  13 files changed, 1247 insertions(+), 223 deletions(-)
>> >>
>> >> --
>> >> 2.40.1
>>
--
Crash-utility mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Crash-utility] Re: [PATCH v7 00/15] gdb stack unwinding support for crash utility

Reply via email to