[Crash-utility] Re: [PATCH v7 00/15] gdb stack unwinding support for crash utility

Alexey Makhalov Wed, 06 Nov 2024 15:08:37 -0800

Glad to see it merged! Many colleagues were waiting for it. Thanks to
everyone who contributed to this effort. --Alexey


On Wed, Nov 6, 2024 at 12:29 AM Aditya Gupta <[email protected]> wrote:

> Hi all,
>
>
> Thanks Lianbo, Tao, Alexey and Daisuke for your reviews on this series.
>
> Feels amazing to finally see this merged !
>
> Thank you Tao for collaborating on this for so many months !
>
> Hope this helps many people, I have been pinged my multiple people in
> dev and support teams that this information can help them classify the
> issue into which subsystem the issue might be in.
>
>
> Thanks again,
>
> - Aditya Gupta
>
>
> On 04/11/24 13:39, lijiang wrote:
> > Thank you for working on this feature, Aditya, Tao and Alex. Great
> > job! For the [PATCH v7 02/15 -15/15], rearranged them with minor
> > changes: [1]
> > https: //github.
> com/crash-utility/crash/commit/21e0a345f97324b3472d573ed20ef098f0300fac
> > [2]
> > https: //github.
> com/crash-utility/crash/commit/c4db469af091edd1ea0897fbce41bc175375314b
> >
> > Thank you for working on this feature, Aditya, Tao and Alex. Great job!
> >
> > For the [PATCH v7 02/15 -15/15], rearranged them with minor changes:
> >
> > [1]
> >
> https://github.com/crash-utility/crash/commit/21e0a345f97324b3472d573ed20ef098f0300fac
> > [2]
> >
> https://github.com/crash-utility/crash/commit/c4db469af091edd1ea0897fbce41bc175375314b
> > [3]
> >
> https://github.com/crash-utility/crash/commit/7c8a7dddda66b3d1043ba99516de57691033154a
> > [4]
> >
> https://github.com/crash-utility/crash/commit/1fd80c623c205443fdd2a29b14c5230a09984147
> > [5]
> >
> https://github.com/crash-utility/crash/commit/6dfda0d2235574cf80530ea92e0ddff270f9c039
> > [6]
> >
> https://github.com/crash-utility/crash/commit/89ff1e45734457eb66905ef656775fcfd1b46aec
> > [7]
> >
> https://github.com/crash-utility/crash/commit/968debd0d5979dd9ddca3af0766bad714dbd51e3
> >
> > BTW: there are still some known issues about this one, but not
> > critical issues, so which can be fixed later.
> >
> > Reminder: the current patchset has changed some function interfaces,
> > which may affect crash extensions.
> >
> > Thanks
> > Lianbo
> >
> > On Wed, Sep 11, 2024 at 10:25 AM lijiang <[email protected]> wrote:
> >
> >     Hi, Tao
> >
> >     Thank you for the update.
> >
> >     The following patch is a regression issue, so I tend to discuss it
> >     as a separate patch.
> >     [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper
> >
> >     In addition, I found another issue in my tests(on ppc64le), the
> >     gdb bt can display the back trace for the panic task, but when I
> >     switch to another task, the gdb bt can not display the back trace:
> >
> >     crash> gdb bt
> >     #0  0xc0000000002bde04 in crash_setup_regs
> >     (newregs=0xc00000003264b858, oldregs=0x0) at
> >     ./arch/powerpc/include/asm/kexec.h:133
> >     #1  0xc0000000002be4f8 in __crash_kexec (regs=0x0) at
> >     kernel/crash_core.c:122
> >     #2  0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq
> >     triggered crash\n") at kernel/panic.c:373
> >     #3  0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>)
> >     at drivers/tty/sysrq.c:154
> >     #4  0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c',
> >     check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612
> >     #5  0xc000000000a71e94 in write_sysrq_trigger (file=<optimized
> >     out>, buf=<optimized out>, count=2, ppos=<optimized out>) at
> >     drivers/tty/sysrq.c:1181
> >     #6  0xc00000000073260c in pde_write (pde=0xc00000000af9cc00,
> >     file=<optimized out>, buf=<optimized out>, count=<optimized out>,
> >     ppos=<optimized out>) at fs/proc/inode.c:334
> >     #7  proc_reg_write (file=<optimized out>, buf=<optimized out>,
> >     count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346
> >     #8  0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900,
> >     buf=0x10012536f60 <error: Cannot access memory at address
> >     0x10012536f60>, count=2, pos=0xc00000003264bd30) at
> >     fs/read_write.c:588
> >     #9  vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error:
> >     Cannot access memory at address 0x10012536f60>, count=<optimized
> >     out>, pos=0xc00000003264bd30) at fs/read_write.c:570
> >     #10 0xc00000000063c690 in ksys_write (fd=<optimized out>,
> >     buf=0x10012536f60 <error: Cannot access memory at address
> >     0x10012536f60>, count=2) at fs/read_write.c:643
> >     #11 0xc000000000031a28 in system_call_exception
> >     (regs=0xc00000003264be80, r0=<optimized out>) at
> >     arch/powerpc/kernel/syscall.c:153
> >     #12 0xc00000000000d05c in system_call_vectored_common () at
> >     arch/powerpc/kernel/interrupt_64.S:198
> >
> >     crash> ps
> >           PID    PPID  CPU       TASK        ST  %MEM  VSZ      RSS  COMM
> >             0       0   0  c000000002bda980  RU   0.0    0        0
> >      [swapper/0]
> >     >       0       0   1  c000000003864c80  RU   0.0      0        0
> >      [swapper/1]
> >     ...
> >          8017     923   0  c000000043a20000  IN   0.2  22528    16256
> >      sshd-session
> >          8025    8017   6  c000000032271880  IN   0.1  22784    11840
> >      sshd-session
> >     >    8026    8025   0  c000000043a26600  RU   0.1   9664     6208
> >      bash
> >     ...
> >         11645       2   3  c000000032264c80  ID   0.0    0        0
> >      [kworker/u32:2]
> >         11738    6188   2  c00000003811b180  IN   0.1  43520     9408
> >      pickup
> >         12326       2   0  c00000003226b280  ID   0.0    0        0
> >      [kworker/0:1]
> >         13112    6089   2  c00000000c809900  IN   0.0 7232     3456
>  sleep
> >
> >     Let's take the "pickup" task as an example:
> >
> >     crash> set 11738
> >         PID: 11738
> >     COMMAND: "pickup"
> >        TASK: c00000003811b180  [THREAD_INFO: c00000003811b180]
> >         CPU: 2
> >       STATE: TASK_INTERRUPTIBLE
> >
> >     crash> gdb bt
> >     #0  0xc0000000a7f876a0 in ?? ()
> >     gdb: gdb request failed: bt
> >     crash> set gdb on
> >     gdb: on
> >     gdb> bt
> >     #0  0xc0000000a7f876a0 in ?? ()
> >     gdb>
> >
> >     Anyway, I did the same test on x86 64 and aarch64, it can work
> >     well as expected. Can you help to double check on ppc64 architecture?
> >
> >     X86 64:
> >     crash> set 14599
> >         PID: 14599
> >     COMMAND: "pickup"
> >        TASK: ffff8f57a0d7c180  [THREAD_INFO: ffff8f57a0d7c180]
> >         CPU: 41
> >       STATE: TASK_INTERRUPTIBLE
> >     crash> gdb bt
> >     #0  0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900,
> >     prev=0xffff8f57a0d7c180, next=0xffff8f5786720000,
> >     rf=0xffff9df22fea7b80) at kernel/sched/core.c:5208
> >     #1  __schedule (sched_mode=sched_mode@entry=0) at
> >     kernel/sched/core.c:6549
> >     #2  0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized
> >     out>) at kernel/sched/core.c:6626
> >     #3  schedule () at kernel/sched/core.c:6641
> >     #4  0xffffffff8b3f6eef in schedule_hrtimeout_range_clock
> >     (expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>,
> >     delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS,
> >     clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> >     #5  0xffffffff8b3f7003 in schedule_hrtimeout_range
> >     (expires=expires@entry=0xffff9df22fea7cb0,
> >     delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at
> >     kernel/time/hrtimer.c:2340
> >     #6  0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40,
> >     events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100,
> >     timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062
> >     #7  0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8,
> >     events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100,
> >     to=0xffff9df22fea7d58) at fs/eventpoll.c:2464
> >     #8  0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized
> >     out>, events=0x7ffea91b6b90, maxevents=<optimized out>,
> >     timeout=<optimized out>) at fs/eventpoll.c:2476
> >     #9  __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized
> >     out>, maxevents=<optimized out>, timeout=<optimized out>) at
> >     fs/eventpoll.c:2471
> >     #10 __x64_sys_epoll_wait (regs=<optimized out>) at
> fs/eventpoll.c:2471
> >     #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48,
> >     nr=232) at arch/x86/entry/common.c:52
> >     #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at
> >     arch/x86/entry/common.c:83
> >     #13 0xffffffff8b40012f in entry_SYSCALL_64 () at
> >     arch/x86/entry/entry_64.S:121
> >     crash>
> >
> >
> >     aarch64:
> >     crash> set 9338
> >         PID: 9338
> >     COMMAND: "pickup"
> >        TASK: ffff0000c7b05400  [THREAD_INFO: ffff0000c7b05400]
> >         CPU: 3
> >       STATE: TASK_INTERRUPTIBLE
> >     crash> gdb bt
> >     #0  __switch_to (prev=<unavailable>,
> >     prev@entry=0xffff0000c7b05400, next=next@entry=<unavailable>) at
> >     arch/arm64/kernel/process.c:555
> >     #1  0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0,
> >     prev=0xffff0000c7b05400, next=<unavailable>,
> >     rf=0xffff80008ac63a60) at kernel/sched/core.c:5208
> >     #2  __schedule (sched_mode=sched_mode@entry=0) at
> >     kernel/sched/core.c:6549
> >     #3  0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized
> >     out>) at kernel/sched/core.c:6626
> >     #4  schedule () at kernel/sched/core.c:6641
> >     #5  0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock
> >     (expires=expires@entry=0xffff80008ac63be8,
> >     delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS,
> >     clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293
> >     #6  0xffffafc5b5ec618c in schedule_hrtimeout_range
> >     (expires=expires@entry=0xffff80008ac63be8,
> >     delta=delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS) at
> >     kernel/time/hrtimer.c:2340
> >     #7  0xffffafc5b545d33c in ep_poll (ep=<unavailable>,
> >     events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> >     timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062
> >     #8  0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8,
> >     events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> >     to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464
> >     #9  0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8,
> >     events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100,
> >     to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>,
> >     sigmask=<optimized out>) at fs/eventpoll.c:2498
> >     #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8,
> >     events=0xffffde5c3f68, maxevents=100, to=0xffff80008ac63ce0,
> >     sigmask=<optimized out>, sigsetsize=<optimized out>) at
> >     fs/eventpoll.c:2495
> >     #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68,
> >     maxevents=100, timeout=<optimized out>, sigmask=<optimized out>,
> >     sigsetsize=<optimized out>) at fs/eventpoll.c:2511
> >     #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856,
> >     maxevents=100, timeout=<optimized out>, sigmask=<optimized out>,
> >     sigsetsize=<optimized out>) at fs/eventpoll.c:2505
> >     #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at
> >     fs/eventpoll.c:2505
> >     #14 0xffffafc5b4fa99bc in __invoke_syscall
> >     (regs=0xffff80008ac63eb0, syscall_fn=<optimized out>) at
> >     arch/arm64/kernel/syscall.c:35
> >     #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0,
> >     scno=<optimized out>, sc_nr=sc_nr@entry=463,
> >     syscall_table=<optimized out>) at arch/arm64/kernel/syscall.c:49
> >     #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463,
> >     syscall_table=<optimized out>, regs=0xffff80008ac63eb0,
> >     scno=<optimized out>) at arch/arm64/kernel/syscall.c:132
> >     #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at
> >     arch/arm64/kernel/syscall.c:151
> >     #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at
> >     arch/arm64/kernel/entry-common.c:712
> >     #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized
> >     out>) at arch/arm64/kernel/entry-common.c:730
> >     #20 0xffffafc5b4f91634 in el0t_64_sync () at
> >     arch/arm64/kernel/entry.S:598
> >     crash>
> >
> >     BTW:  other changes are fine to me.
> >
> >     Thanks
> >     Lianbo
> >
> >     On Wed, Sep 4, 2024 at 3:54 PM
> >     <[email protected]> wrote:
> >
> >         Date: Wed,  4 Sep 2024 19:49:25 +1200
> >         From: Tao Liu <[email protected]>
> >         Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding
> >         support
> >                 for crash utility
> >         To: [email protected]
> >         Cc: Tao Liu <[email protected]>
> >         Message-ID: <[email protected]>
> >         Content-Type: text/plain; charset=UTF-8
> >
> >         This patchset is a rebase/merged version of the following 3
> >         patchsets:
> >
> >         1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1]
> >         2): [PATCH 0/5] x86_64 gdb stack unwinding support [2]
> >         3): Clean up on top of one-thread-v2 [3]
> >
> >         A complete description of gdb stack unwinding support for
> >         crash can be
> >         found in [1].
> >
> >         This patchset can be divided into the following 3 parts:
> >
> >         1) part1: preparations before stack unwinding support, some
> >                   bugs/regressions found when drafting this patchset.
> >         2) part2: common part for all CPU archs, mainly dealing with
> >                   crash_target.c/gdb_interface.c files, in order to
> >                   support different archs.
> >         3) part3: arch specific, for each ppc64/x86_64/arm64/vmware
> >                   stack unwinding support.
> >
> >         === part 3
> >         arm64: Add gdb stack unwinding support
> >         vmware_guestdump: Various format versions support
> >         x86_64: Add gdb stack unwinding support
> >         ppc64: correct gdb passthroughs by implementing
> >         machdep->get_current_task_reg
> >
> >         === part 2
> >         Conditionally output gdb stack unwinding stop reasons
> >         Stop stack unwinding at non-kernel address
> >         Print task pid/command instead of CPU index
> >         Rename get_cpu_reg to get_current_task_reg
> >         Let crash change gdb context
> >         Leave only one gdb thread for crash
> >         Remove 'frame' from prohibited commands list
> >
> >         === part 1
> >         Fix gdb_interface: restore gdb's output streams at end of
> >         gdb_interface
> >         x86_64: Fix invalid input "=>" for bt command
> >         Fix cpumask_t recursive dependence issue
> >         Fix the regression of cpumask_t for xen hyper
> >         ===
> >
> >         v7 -> v6:
> >         1) Reorganise the patchset, re-divided them into 3 part
> >         against the
> >            previous 2 parts.
> >         2) Re-dealed with the cpumask_t part, which solved the comment
> >         No.4
> >            pointed out by lianbo in [4].
> >         3) Add conditional output for the failing message of gdb stack
> >         unwinding.
> >            see [PATCH 11/15] Conditionally output gdb stack unwinding
> >         stop reasons
> >         4) Redraft the commit messages, updated some outdated info.
> >         5) Merged "Let crash change gdb context" and "set_context():
> >         check if
> >            context is already current" into one.
> >
> >         [4]:
> >
> https://www.mail-archive.com/[email protected]/msg01067.html
> >
> >         v6 -> v5:
> >         1) Refactor patch 4 & 9, which changed the function signature
> >         of struct
> >            get_cpu_reg/get_current_task_reg, and let each patch
> >         compile with no
> >            error when added on.
> >         2) Rebased the patchset on top of latest upstream:
> >            ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash
> >         --osrelease' or
> >            crash loading")
> >
> >         v5 -> v4:
> >         1) Plenty of code refactoring based on Lianbo's comments on v4.
> >         2) Removed the magic number when dealing with regs bitmap, see
> >         [6].
> >         3) Rebased the patchset on top of latest upstream:
> >            ("1c6da3eaff8207 arm64: Fix bt command show wrong
> >         stacktrace on ramdump source")
> >
> >         v4 -> v3:
> >         Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface:
> >         restore gdb's
> >         output streams at end of gdb_interface.
> >
> >         v3 -> v2:
> >         1) Updated CC list as pointed out in [4]
> >         2) Compiling issues as in [5]
> >
> >         v2 -> v1:
> >         1) Added the patch: x86_64: Fix invalid input "=>" for bt
> command,
> >            thanks for Kazu's testing.
> >         2) Modify the patch: x86_64: Add gdb stack unwinding support,
> >         added the
> >            pcp_save, spp_save and sp, for restoring the value in match
> >         of the original
> >            code logic.
> >
> >         [1]:
> >
> https://www.mail-archive.com/[email protected]/msg00469.html
> >         [2]:
> >
> https://www.mail-archive.com/[email protected]/msg00488.html
> >         [3]:
> >
> https://www.mail-archive.com/[email protected]/msg00554.html
> >         [4]:
> >
> https://www.mail-archive.com/[email protected]/msg00681.html
> >         [5]:
> >
> https://www.mail-archive.com/[email protected]/msg00715.html
> >         [6]:
> >
> https://www.mail-archive.com/[email protected]/msg00819.html
> >
> >         Aditya Gupta (3):
> >           Fix gdb_interface: restore gdb's output streams at end of
> >             gdb_interface
> >           Remove 'frame' from prohibited commands list
> >           ppc64: correct gdb passthroughs by implementing
> >             machdep->get_current_task_reg
> >
> >         Alexey Makhalov (1):
> >           vmware_guestdump: Various format versions support
> >
> >         Tao Liu (11):
> >           Fix the regression of cpumask_t for xen hyper
> >           Fix cpumask_t recursive dependence issue
> >           x86_64: Fix invalid input "=>" for bt command
> >           Leave only one gdb thread for crash
> >           Let crash change gdb context
> >           Rename get_cpu_reg to get_current_task_reg
> >           Print task pid/command instead of CPU index
> >           Stop stack unwinding at non-kernel address
> >           Conditionally output gdb stack unwinding stop reasons
> >           x86_64: Add gdb stack unwinding support
> >           arm64: Add gdb stack unwinding support
> >
> >          arm64.c            | 120 +++++++++++++++--
> >          crash_target.c     |  71 ++++++----
> >          defs.h             | 194 ++++++++++++++++++++++++++-
> >          gdb-10.2.patch     |  96 ++++++++++++++
> >          gdb_interface.c    |  39 ++----
> >          kernel.c           |  63 +++++++--
> >          ppc64.c            | 174 +++++++++++++++++++++++-
> >          symbols.c          |  15 +++
> >          task.c             |  34 +++--
> >          tools.c            |  16 ++-
> >          unwind_x86_64.h    |   4 -
> >          vmware_guestdump.c | 321
> >         +++++++++++++++++++++++++++++++-------------
> >          x86_64.c           | 323
> >         ++++++++++++++++++++++++++++++++++++++++-----
> >          13 files changed, 1247 insertions(+), 223 deletions(-)
> >
> >         --
> >         2.40.1
> >
>

--
Crash-utility mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki

[Crash-utility] Re: [PATCH v7 00/15] gdb stack unwinding support for crash utility

Reply via email to