Hi lianbo, On Wed, Sep 11, 2024 at 7:32 PM lijiang <liji...@redhat.com> wrote: > > On Wed, Sep 11, 2024 at 2:36 PM Tao Liu <l...@redhat.com> wrote: >> >> Hi lianbo, >> >> On Wed, Sep 11, 2024 at 2:26 PM lijiang <liji...@redhat.com> wrote: >> > >> > Hi, Tao >> > >> > Thank you for the update. >> > >> > The following patch is a regression issue, so I tend to discuss it as a >> > separate patch. >> > [PATCH v7 01/15] Fix the regression of cpumask_t for xen hyper > > > Can you also post v2 for this one? I have two comments about it: > [1] is it possible to not introduce the code related to hyper to a common > module such as tools.c? > [2] for IA64 arch, I saw the machdep->get_irq_affinity = > generic_get_irq_affinity is registered (see the ia64_init()) > > >> > >> > In addition, I found another issue in my tests(on ppc64le), the gdb bt can >> > display the back trace for the panic task, but when I switch to another >> > task, the gdb bt can not display the back trace: >> > >> > crash> gdb bt >> > #0 0xc0000000002bde04 in crash_setup_regs (newregs=0xc00000003264b858, >> > oldregs=0x0) at ./arch/powerpc/include/asm/kexec.h:133 >> > #1 0xc0000000002be4f8 in __crash_kexec (regs=0x0) at >> > kernel/crash_core.c:122 >> > #2 0xc00000000016c254 in panic (fmt=0xc0000000015eef20 "sysrq triggered >> > crash\n") at kernel/panic.c:373 >> > #3 0xc000000000a708b8 in sysrq_handle_crash (key=<optimized out>) at >> > drivers/tty/sysrq.c:154 >> > #4 0xc000000000a713d4 in __handle_sysrq (key=key@entry=99 'c', >> > check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:612 >> > #5 0xc000000000a71e94 in write_sysrq_trigger (file=<optimized out>, >> > buf=<optimized out>, count=2, ppos=<optimized out>) at >> > drivers/tty/sysrq.c:1181 >> > #6 0xc00000000073260c in pde_write (pde=0xc00000000af9cc00, >> > file=<optimized out>, buf=<optimized out>, count=<optimized out>, >> > ppos=<optimized out>) at fs/proc/inode.c:334 >> > #7 proc_reg_write (file=<optimized out>, buf=<optimized out>, >> > count=<optimized out>, ppos=<optimized out>) at fs/proc/inode.c:346 >> > #8 0xc00000000063c0e0 in vfs_write (file=0xc0000000092d2900, >> > buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, >> > count=2, pos=0xc00000003264bd30) at fs/read_write.c:588 >> > #9 vfs_write (file=0xc0000000092d2900, buf=0x10012536f60 <error: Cannot >> > access memory at address 0x10012536f60>, count=<optimized out>, >> > pos=0xc00000003264bd30) at fs/read_write.c:570 >> > #10 0xc00000000063c690 in ksys_write (fd=<optimized out>, >> > buf=0x10012536f60 <error: Cannot access memory at address 0x10012536f60>, >> > count=2) at fs/read_write.c:643 >> > #11 0xc000000000031a28 in system_call_exception (regs=0xc00000003264be80, >> > r0=<optimized out>) at arch/powerpc/kernel/syscall.c:153 >> > #12 0xc00000000000d05c in system_call_vectored_common () at >> > arch/powerpc/kernel/interrupt_64.S:198 >> > >> > crash> ps >> > PID PPID CPU TASK ST %MEM VSZ RSS COMM >> > 0 0 0 c000000002bda980 RU 0.0 0 0 >> > [swapper/0] >> > > 0 0 1 c000000003864c80 RU 0.0 0 0 >> > > [swapper/1] >> > ... >> > 8017 923 0 c000000043a20000 IN 0.2 22528 16256 >> > sshd-session >> > 8025 8017 6 c000000032271880 IN 0.1 22784 11840 >> > sshd-session >> > > 8026 8025 0 c000000043a26600 RU 0.1 9664 6208 bash >> > ... >> > 11645 2 3 c000000032264c80 ID 0.0 0 0 >> > [kworker/u32:2] >> > 11738 6188 2 c00000003811b180 IN 0.1 43520 9408 pickup >> > 12326 2 0 c00000003226b280 ID 0.0 0 0 >> > [kworker/0:1] >> > 13112 6089 2 c00000000c809900 IN 0.0 7232 3456 sleep >> > >> > Let's take the "pickup" task as an example: >> > >> > crash> set 11738 >> > PID: 11738 >> > COMMAND: "pickup" >> > TASK: c00000003811b180 [THREAD_INFO: c00000003811b180] >> > CPU: 2 >> > STATE: TASK_INTERRUPTIBLE >> > >> > crash> gdb bt >> > #0 0xc0000000a7f876a0 in ?? () >> > gdb: gdb request failed: bt >> > crash> set gdb on >> > gdb: on >> > gdb> bt >> > #0 0xc0000000a7f876a0 in ?? () >> > gdb> >> >> There is a bug for ppc64 crash of newer version kernel. The code for >> determining the address of pt_regs from stack is outdated, see the >> following code from crash: >> >> ppc64.c:get_ppc64_frame() >> readmem(sp+STACK_FRAME_OVERHEAD, KVADDR, ®s, sizeof(struct >> ppc64_pt_regs), "PPC64 pt_regs", FAULT_ON_ERROR); >> >> The pt_regs is expected to be placed at sp+STACK_FRAME_OVERHEAD, aka sp+112. >> >> However since kernel >= v6.2, the value is no longer appropriate: >> >> linux kernel:arch/powerpc/kernel/process.c:copy_thread(): >> kregs = (struct pt_regs *)(sp + STACK_SWITCH_FRAME_REGS); >> p->thread.ksp = sp; >> >> linux kernel:arch/powerpc/include/asm/ptrace.h: >> #ifdef CONFIG_PPC64_ELF_ABI_V2 >> #define STACK_FRAME_MIN_SIZE 32 >> STACK_SWITCH_FRAME_REGS (STACK_FRAME_MIN_SIZE + 16) >> > > Good findings, Tao.
The patch to fix this is posted: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01124.html > >> >> If we apply the change to crash, i.e: >> readmem(sp+0x30, KVADDR, ®s, sizeof(struct ppc64_pt_regs), "PPC64 >> pt_regs", FAULT_ON_ERROR); >> >> The stack unwinding can work as expected, you can have a test locally >> to see if the above change works for you. >> >> So this bug isn't related to the gdb stack unwinding support to me, >> just a bug relating to a newer version of kernel. >> > > Agree. For the [PATCH V7 02/15] -[PATCH V7 15/15]: Ack. > > And I will put them in the merging queue, once the current issue gets > resolved, we can merge them together. Otherwise it may not work on ppc64 arch. OK, agreed. Thanks, Tao Liu > >> I think we can post an individual patch to deal with this issue. Since >> there are plenty of places in crash which use the old >> STACK_FRAME_OVERHEAD value, maybe they all need to be updated. >> > > Please go ahead. > > Thanks > Lianbo > >> >> Thanks, >> Tao Liu >> > >> > Anyway, I did the same test on x86 64 and aarch64, it can work well as >> > expected. Can you help to double check on ppc64 architecture? >> > >> > X86 64: >> > crash> set 14599 >> > PID: 14599 >> > COMMAND: "pickup" >> > TASK: ffff8f57a0d7c180 [THREAD_INFO: ffff8f57a0d7c180] >> > CPU: 41 >> > STATE: TASK_INTERRUPTIBLE >> > crash> gdb bt >> > #0 0xffffffff8b3efe29 in context_switch (rq=0xffff8f6f1f835900, >> > prev=0xffff8f57a0d7c180, next=0xffff8f5786720000, rf=0xffff9df22fea7b80) >> > at kernel/sched/core.c:5208 >> > #1 __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549 >> > #2 0xffffffff8b3f0217 in __schedule_loop (sched_mode=<optimized out>) at >> > kernel/sched/core.c:6626 >> > #3 schedule () at kernel/sched/core.c:6641 >> > #4 0xffffffff8b3f6eef in schedule_hrtimeout_range_clock >> > (expires=expires@entry=0xffff9df22fea7cb0, delta=<optimized out>, >> > delta@entry=99999999, mode=mode@entry=HRTIMER_MODE_ABS, >> > clock_id=clock_id@entry=1) at kernel/time/hrtimer.c:2293 >> > #5 0xffffffff8b3f7003 in schedule_hrtimeout_range >> > (expires=expires@entry=0xffff9df22fea7cb0, delta=delta@entry=99999999, >> > mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340 >> > #6 0xffffffff8aae301c in ep_poll (ep=0xffff8f5790d15d40, >> > events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, >> > timeout=timeout@entry=0xffff9df22fea7d58) at fs/eventpoll.c:2062 >> > #7 0xffffffff8aae3138 in do_epoll_wait (epfd=epfd@entry=8, >> > events=events@entry=0x7ffea91b6b90, maxevents=maxevents@entry=100, >> > to=0xffff9df22fea7d58) at fs/eventpoll.c:2464 >> > #8 0xffffffff8aae44a1 in __do_sys_epoll_wait (epfd=<optimized out>, >> > events=0x7ffea91b6b90, maxevents=<optimized out>, timeout=<optimized out>) >> > at fs/eventpoll.c:2476 >> > #9 __se_sys_epoll_wait (epfd=<optimized out>, events=<optimized out>, >> > maxevents=<optimized out>, timeout=<optimized out>) at fs/eventpoll.c:2471 >> > #10 __x64_sys_epoll_wait (regs=<optimized out>) at fs/eventpoll.c:2471 >> > #11 0xffffffff8b3e293d in do_syscall_x64 (regs=0xffff9df22fea7f48, nr=232) >> > at arch/x86/entry/common.c:52 >> > #12 do_syscall_64 (regs=0xffff9df22fea7f48, nr=232) at >> > arch/x86/entry/common.c:83 >> > #13 0xffffffff8b40012f in entry_SYSCALL_64 () at >> > arch/x86/entry/entry_64.S:121 >> > crash> >> > >> > >> > aarch64: >> > crash> set 9338 >> > PID: 9338 >> > COMMAND: "pickup" >> > TASK: ffff0000c7b05400 [THREAD_INFO: ffff0000c7b05400] >> > CPU: 3 >> > STATE: TASK_INTERRUPTIBLE >> > crash> gdb bt >> > #0 __switch_to (prev=<unavailable>, prev@entry=0xffff0000c7b05400, >> > next=next@entry=<unavailable>) at arch/arm64/kernel/process.c:555 >> > #1 0xffffafc5b5ebd744 in context_switch (rq=0xffff00077bbd0ec0, >> > prev=0xffff0000c7b05400, next=<unavailable>, rf=0xffff80008ac63a60) at >> > kernel/sched/core.c:5208 >> > #2 __schedule (sched_mode=sched_mode@entry=0) at kernel/sched/core.c:6549 >> > #3 0xffffafc5b5ebdc2c in __schedule_loop (sched_mode=<optimized out>) at >> > kernel/sched/core.c:6626 >> > #4 schedule () at kernel/sched/core.c:6641 >> > #5 0xffffafc5b5ec6030 in schedule_hrtimeout_range_clock >> > (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, >> > mode=mode@entry=HRTIMER_MODE_ABS, clock_id=clock_id@entry=1) at >> > kernel/time/hrtimer.c:2293 >> > #6 0xffffafc5b5ec618c in schedule_hrtimeout_range >> > (expires=expires@entry=0xffff80008ac63be8, delta=delta@entry=99999999, >> > mode=mode@entry=HRTIMER_MODE_ABS) at kernel/time/hrtimer.c:2340 >> > #7 0xffffafc5b545d33c in ep_poll (ep=<unavailable>, >> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, >> > timeout=timeout@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2062 >> > #8 0xffffafc5b545d4e4 in do_epoll_wait (epfd=epfd@entry=8, >> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, >> > to=to@entry=0xffff80008ac63ce0) at fs/eventpoll.c:2464 >> > #9 0xffffafc5b545d534 in do_epoll_pwait (epfd=epfd@entry=8, >> > events=events@entry=0xffffde5c3f68, maxevents=maxevents@entry=100, >> > to=to@entry=0xffff80008ac63ce0, sigsetsize=<optimized out>, >> > sigmask=<optimized out>) at fs/eventpoll.c:2498 >> > #10 0xffffafc5b545e7c8 in do_epoll_pwait (epfd=8, events=0xffffde5c3f68, >> > maxevents=100, to=0xffff80008ac63ce0, sigmask=<optimized out>, >> > sigsetsize=<optimized out>) at fs/eventpoll.c:2495 >> > #11 __do_sys_epoll_pwait (epfd=8, events=0xffffde5c3f68, maxevents=100, >> > timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized >> > out>) at fs/eventpoll.c:2511 >> > #12 __se_sys_epoll_pwait (epfd=8, events=281474412330856, maxevents=100, >> > timeout=<optimized out>, sigmask=<optimized out>, sigsetsize=<optimized >> > out>) at fs/eventpoll.c:2505 >> > #13 __arm64_sys_epoll_pwait (regs=<optimized out>) at fs/eventpoll.c:2505 >> > #14 0xffffafc5b4fa99bc in __invoke_syscall (regs=0xffff80008ac63eb0, >> > syscall_fn=<optimized out>) at arch/arm64/kernel/syscall.c:35 >> > #15 invoke_syscall (regs=regs@entry=0xffff80008ac63eb0, scno=<optimized >> > out>, sc_nr=sc_nr@entry=463, syscall_table=<optimized out>) at >> > arch/arm64/kernel/syscall.c:49 >> > #16 0xffffafc5b4fa9ac8 in el0_svc_common (sc_nr=463, >> > syscall_table=<optimized out>, regs=0xffff80008ac63eb0, scno=<optimized >> > out>) at arch/arm64/kernel/syscall.c:132 >> > #17 do_el0_svc (regs=regs@entry=0xffff80008ac63eb0) at >> > arch/arm64/kernel/syscall.c:151 >> > #18 0xffffafc5b5eb6fa4 in el0_svc (regs=0xffff80008ac63eb0) at >> > arch/arm64/kernel/entry-common.c:712 >> > #19 0xffffafc5b5eb74c0 in el0t_64_sync_handler (regs=<optimized out>) at >> > arch/arm64/kernel/entry-common.c:730 >> > #20 0xffffafc5b4f91634 in el0t_64_sync () at arch/arm64/kernel/entry.S:598 >> > crash> >> > >> > BTW: other changes are fine to me. >> > >> > Thanks >> > Lianbo >> > >> > On Wed, Sep 4, 2024 at 3:54 PM <devel-requ...@lists.crash-utility.osci.io> >> > wrote: >> >> >> >> Date: Wed, 4 Sep 2024 19:49:25 +1200 >> >> From: Tao Liu <l...@redhat.com> >> >> Subject: [Crash-utility] [PATCH v7 00/15] gdb stack unwinding support >> >> for crash utility >> >> To: devel@lists.crash-utility.osci.io >> >> Cc: Tao Liu <l...@redhat.com> >> >> Message-ID: <20240904074940.21331-1-l...@redhat.com> >> >> Content-Type: text/plain; charset=UTF-8 >> >> >> >> This patchset is a rebase/merged version of the following 3 patchsets: >> >> >> >> 1): [PATCH v10 0/5] Improve stack unwind on ppc64 [1] >> >> 2): [PATCH 0/5] x86_64 gdb stack unwinding support [2] >> >> 3): Clean up on top of one-thread-v2 [3] >> >> >> >> A complete description of gdb stack unwinding support for crash can be >> >> found in [1]. >> >> >> >> This patchset can be divided into the following 3 parts: >> >> >> >> 1) part1: preparations before stack unwinding support, some >> >> bugs/regressions found when drafting this patchset. >> >> 2) part2: common part for all CPU archs, mainly dealing with >> >> crash_target.c/gdb_interface.c files, in order to >> >> support different archs. >> >> 3) part3: arch specific, for each ppc64/x86_64/arm64/vmware >> >> stack unwinding support. >> >> >> >> === part 3 >> >> arm64: Add gdb stack unwinding support >> >> vmware_guestdump: Various format versions support >> >> x86_64: Add gdb stack unwinding support >> >> ppc64: correct gdb passthroughs by implementing >> >> machdep->get_current_task_reg >> >> >> >> === part 2 >> >> Conditionally output gdb stack unwinding stop reasons >> >> Stop stack unwinding at non-kernel address >> >> Print task pid/command instead of CPU index >> >> Rename get_cpu_reg to get_current_task_reg >> >> Let crash change gdb context >> >> Leave only one gdb thread for crash >> >> Remove 'frame' from prohibited commands list >> >> >> >> === part 1 >> >> Fix gdb_interface: restore gdb's output streams at end of gdb_interface >> >> x86_64: Fix invalid input "=>" for bt command >> >> Fix cpumask_t recursive dependence issue >> >> Fix the regression of cpumask_t for xen hyper >> >> === >> >> >> >> v7 -> v6: >> >> 1) Reorganise the patchset, re-divided them into 3 part against the >> >> previous 2 parts. >> >> 2) Re-dealed with the cpumask_t part, which solved the comment No.4 >> >> pointed out by lianbo in [4]. >> >> 3) Add conditional output for the failing message of gdb stack unwinding. >> >> see [PATCH 11/15] Conditionally output gdb stack unwinding stop reasons >> >> 4) Redraft the commit messages, updated some outdated info. >> >> 5) Merged "Let crash change gdb context" and "set_context(): check if >> >> context is already current" into one. >> >> >> >> [4]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg01067.html >> >> >> >> v6 -> v5: >> >> 1) Refactor patch 4 & 9, which changed the function signature of struct >> >> get_cpu_reg/get_current_task_reg, and let each patch compile with no >> >> error when added on. >> >> 2) Rebased the patchset on top of latest upstream: >> >> ("79b93ecb2e72ec Fix a "Bus error" issue caused by 'crash --osrelease' >> >> or >> >> crash loading") >> >> >> >> v5 -> v4: >> >> 1) Plenty of code refactoring based on Lianbo's comments on v4. >> >> 2) Removed the magic number when dealing with regs bitmap, see [6]. >> >> 3) Rebased the patchset on top of latest upstream: >> >> ("1c6da3eaff8207 arm64: Fix bt command show wrong stacktrace on >> >> ramdump source") >> >> >> >> v4 -> v3: >> >> Fixed the author issue in [PATCH v3 06/16] Fix gdb_interface: restore >> >> gdb's >> >> output streams at end of gdb_interface. >> >> >> >> v3 -> v2: >> >> 1) Updated CC list as pointed out in [4] >> >> 2) Compiling issues as in [5] >> >> >> >> v2 -> v1: >> >> 1) Added the patch: x86_64: Fix invalid input "=>" for bt command, >> >> thanks for Kazu's testing. >> >> 2) Modify the patch: x86_64: Add gdb stack unwinding support, added the >> >> pcp_save, spp_save and sp, for restoring the value in match of the >> >> original >> >> code logic. >> >> >> >> [1]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00469.html >> >> [2]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00488.html >> >> [3]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00554.html >> >> [4]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00681.html >> >> [5]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00715.html >> >> [6]: >> >> https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00819.html >> >> >> >> Aditya Gupta (3): >> >> Fix gdb_interface: restore gdb's output streams at end of >> >> gdb_interface >> >> Remove 'frame' from prohibited commands list >> >> ppc64: correct gdb passthroughs by implementing >> >> machdep->get_current_task_reg >> >> >> >> Alexey Makhalov (1): >> >> vmware_guestdump: Various format versions support >> >> >> >> Tao Liu (11): >> >> Fix the regression of cpumask_t for xen hyper >> >> Fix cpumask_t recursive dependence issue >> >> x86_64: Fix invalid input "=>" for bt command >> >> Leave only one gdb thread for crash >> >> Let crash change gdb context >> >> Rename get_cpu_reg to get_current_task_reg >> >> Print task pid/command instead of CPU index >> >> Stop stack unwinding at non-kernel address >> >> Conditionally output gdb stack unwinding stop reasons >> >> x86_64: Add gdb stack unwinding support >> >> arm64: Add gdb stack unwinding support >> >> >> >> arm64.c | 120 +++++++++++++++-- >> >> crash_target.c | 71 ++++++---- >> >> defs.h | 194 ++++++++++++++++++++++++++- >> >> gdb-10.2.patch | 96 ++++++++++++++ >> >> gdb_interface.c | 39 ++---- >> >> kernel.c | 63 +++++++-- >> >> ppc64.c | 174 +++++++++++++++++++++++- >> >> symbols.c | 15 +++ >> >> task.c | 34 +++-- >> >> tools.c | 16 ++- >> >> unwind_x86_64.h | 4 - >> >> vmware_guestdump.c | 321 +++++++++++++++++++++++++++++++------------- >> >> x86_64.c | 323 ++++++++++++++++++++++++++++++++++++++++----- >> >> 13 files changed, 1247 insertions(+), 223 deletions(-) >> >> >> >> -- >> >> 2.40.1 >> -- Crash-utility mailing list -- devel@lists.crash-utility.osci.io To unsubscribe send an email to devel-le...@lists.crash-utility.osci.io https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/ Contribution Guidelines: https://github.com/crash-utility/crash/wiki