Hi Aditya,
On Wed, Feb 14, 2024 at 11:10 AM Tao Liu <[email protected]> wrote:
>
> Hi Aditya,
>
> On Mon, Feb 12, 2024 at 4:11 PM Aditya Gupta <[email protected]> wrote:
> >
> > Hi Tao Liu,
> >
> > On Fri, Feb 09, 2024 at 09:15:26PM +0800, Tao Liu wrote:
> > > On Fri, Feb 02, 2024 at 07:04:14AM +0000, HAGIO KAZUHITO(萩尾 一仁) wrote:
> > > > Hi Aditya,
> > > >
> > > > Thank you for the work, it looks nicely done to me, except for the "info
> > > > threads" issue and the style of the gdb patch.
> > > >
> > > > Hi Tao Liu,
> > > >
> > > > I saw that you were interested in support for other architectures. I'd
> > > > like to test/support this also on x86_64 at the same time as ppc64 if
> > > > possible. Do you have any trial patch or plan?
> > >
> > > Hi Kazu & Aditya,
> > >
> > > Sorry for the delay. You can access my trial patch by
> > > https://github.com/liutgnu/crash-dev. There are some known issue:
> >
> > i tested it on a x86_64 and ppc64 vmcore i had, with these sequence of
> > commands (with some more 'gdb up' 'gdb down'):
> >
> > (crash) set
> > (crash) set gdb on
> > gdb> thread
> > gdb> bt
> > gdb> info threads
> > gdb> info threads
> > gdb> info locals
> > gdb> info variables irq_rover_lock
> > gdb> info args
> > gdb> thread 2
> > gdb> set gdb off
> > (crash) set
> > (crash) set -c 6
> > (crash) gdb thread
> > (crash) bt
> > (crash) gdb bt
> > (crash) frame
> > (crash) gdb up
> > (crash) gdb down
> > (crash) info locals
> >
> > had to rebase the patch onto my v8, the tree is available here:
> >
> > https://github.com/adi-g15-ibm/crash/tree/tmp-test-branch-20201
> >
> > >
> > > 1) Some vmcores will stack unwind fail:
> > > $ ./crash /var/crash/127.0.0.1-2023-11-10-18\:27\:30/vmcore ~/vmlinux
> > > KERNEL: /root/vmlinux [TAINTED]
> > > DUMPFILE: /var/crash/127.0.0.1-2023-11-10-18:27:30/vmcore [PARTIAL
> > > DUMP]
> > > CPUS: 1
> > > DATE: Fri Nov 10 18:27:26 CST 2023
> > > UPTIME: 00:10:49
> > > LOAD AVERAGE: 0.07, 0.11, 0.08
> > > TASKS: 133
> > > NODENAME: ibm-p8-kvm-03-guest-02.virt.pnr.lab.eng.rdu2.redhat.com
> > > RELEASE: 5.14.0-39.el9.x86_64
> > > VERSION: #1 SMP PREEMPT Fri Dec 24 00:07:58 EST 2021
> > > MACHINE: x86_64 (2303 Mhz)
> > > MEMORY: 4 GB
> > > PANIC: "Oops: 0002 [#1] PREEMPT SMP NOPTI" (check log for details)
> > > PID: 22722
> > > COMMAND: "insmod"
> > > TASK: ffff973e88408000 [THREAD_INFO: ffff973e88408000]
> > > CPU: 0
> > > STATE: TASK_RUNNING (PANIC)
> > >
> > > crash> set 1
> > > >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
> > > >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
> > > crash> gdb bt
> > > #0 0xffffffffb726a1b3 in context_switch (rf=0xffffa5ee40013d00,
> > > next=0xffff973e88408000, prev=0xffff973e801fb280, rq=0xffff973efbc2c800)
> > > at kernel/sched/core.c:4972
> > > #1 __schedule (sched_mode=0) at kernel/sched/core.c:6253
> > > #2 0x0004005300000004 in ?? ()
> > > crash> bt
> > > PID: 1 TASK: ffff973e801fb280 CPU: 0 COMMAND: "systemd"
> > > #0 [ffffa5ee40013d30] __schedule at ffffffffb726a1b3
> > > #1 [ffffa5ee40013d78] schedule at ffffffffb726a553
> > > #2 [ffffa5ee40013d90] schedule_hrtimeout_range_clock at ffffffffb726f294
> > > #3 [ffffa5ee40013e10] ep_poll at ffffffffb6bc00c4
> > > #4 [ffffa5ee40013eb0] do_epoll_wait at ffffffffb6bc01db
> > > #5 [ffffa5ee40013ee8] __x64_sys_epoll_wait at ffffffffb6bc09b0
> > > #6 [ffffa5ee40013f38] do_syscall_64 at ffffffffb725ea98
> > > #7 [ffffa5ee40013f50] entry_SYSCALL_64_after_hwframe at ffffffffb740007c
> > >
> > > The stack unwinding failed for "gdb bt", it only unwinded the
> > > "__schedule" function. The similarities for the failing is rsp and rbp
> > > which got from stack pointing to different stack frames:
> > > >>>> sp:ffffa5ee40013d00 bp:ffff973efbc2c800 ip:ffffffffb726a1b3
> > > crash> rd ffffa5ee40013d00 32
> > > ffffa5ee40013d00: ffff973ef8eef4a8 0000000000000000 ....>...........
> > > r15, r14 (struct inactive_task_frame)
> > > ffffa5ee40013d10: ffff973e88408000 ffff973e801fb280 ..@.>.......>...
> > > r13, r12
> > > ffffa5ee40013d20: ffff973e801fbd98 ffff973efbc2c800 ....>.......>...
> > > bx, bp
> > > ffffa5ee40013d30: ffffffffb726a1b3 ffff973e00000001 ..&.........>...
> > > ret_addr
> > > ffffa5ee40013d40: 0004005300000004 f915b72f0f356b00 ....S....k5./...
> > > ffffa5ee40013d50: ffff973e801fb280 ffff973e801fb280 ....>.......>...
> > > ffffa5ee40013d60: ffff973e801fb280 ffff973ef8eef4e0 ....>.......>...
> > > ffffa5ee40013d70: 0000000000000000 ffffffffb726a553 ........S.&.....
> > > ffffa5ee40013d80: ffff973ef8eef480 ffffa5ee40013e68 ....>...h>.@....
> > > ffffa5ee40013d90: ffffffffb726f294 ffffffffb6bbf092 ..&.............
> > > ffffa5ee40013da0: 000055d6d4081de0 00000054b6b4c6fd .....U......T...
> > > ffffa5ee40013db0: ffff973ef8eef4d0 ffffa5ee40013db8 ....>....=.@....
> > > ffffa5ee40013dc0: ffffa5ee40013db8 0000000000000000 .=.@............
> > > ffffa5ee40013dd0: ffff973e00000019 f915b72f0f356b00 ....>....k5./...
> > > ffffa5ee40013de0: f915b72f0f356b00 ffff973ef8eef480 .k5./.......>...
> > > ffffa5ee40013df0: ffffa5ee40013e68 ffff973e801fb280 h>.@........>...
> > >
> > > And bp is modified as follows:
> > >
> > > crash> dis schedule
> > > 0xffffffffb726a510 <schedule>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
> > > 0xffffffffb726a515 <schedule+5>: push %rbp
> > > 0xffffffffb726a516 <schedule+6>: mov %gs:0x16f40,%rbp <<<
> > > 0xffffffffb726a51f <schedule+15>: push %rbx
> > > 0xffffffffb726a520 <schedule+16>: mov 0x18(%rbp),%eax
> > > 0xffffffffb726a523 <schedule+19>: test %eax,%eax
> > >
> > > I'm not sure why in this case gdb cannot get the stack unwinded. Other
> > > than this case, the x86_64 stack unwinding works fine according to my
> > > test.
> > >
> > > 2) May break the original ppc64 patch.
> > >
> > > This x86_64 patch is based on the original v7 ppc stack unwinding patch.
> > > And it modified a bit of the original ppc64 patch code. I haven't fully
> > > tested the code in ppc64 arch.
> >
> > For now it seems to work fine on ppc64, I will go through the code
> > changes once. By the way, can you post a rebased patch series for x86 ?
>
> Sure, the rebased patch which is based on your v8 patchset is here:
> https://github.com/liutgnu/crash-dev(force updated). Currently I'm
> doing some testing against it, will post the results later...
The test results look OK...
Thanks,
Tao Liu
>
> Thanks,
> Tao Liu
>
> >
> > Thanks,
> > Aditya Gupta
> >
> > >
> > > Thanks,
> > > Tao Liu
> > >
> > >
> > > >
> > > > Thanks,
> > > > Kazu
> > > >
> > > > On 2024/01/05 16:30, Aditya Gupta wrote:
> > > > > The Problem:
> > > > > ============
> > > > >
> > > > > Currently crash is unable to show function arguments and local
> > > > > variables, as
> > > > > gdb can do. And functionality for moving between frames ('up'/'down')
> > > > > is not
> > > > > working in crash.
> > > > >
> > > > > Crash has 'gdb passthroughs' for things gdb can do, but the gdb
> > > > > passthroughs
> > > > > 'bt', 'frame', 'info locals', 'up', 'down' are not working either,
> > > > > due to
> > > > > gdb not getting the register values from
> > > > > `crash_target::fetch_registers`,
> > > > > which then uses `machdep->get_cpu_reg`, which is not implemented for
> > > > > PPC64
> > > > >
> > > > > Proposed Solution:
> > > > > ==================
> > > > >
> > > > > Fix the gdb passthroughs by implementing "machdep->get_cpu_reg" for
> > > > > PPC64.
> > > > > This way, "gdb mode in crash" will support this feature for both ELF
> > > > > and
> > > > > kdump-compressed vmcore formats, while "gdb" would only have
> > > > > supported ELF
> > > > > format
> > > > >
> > > > > This way other features of 'gdb', such as seeing
> > > > > backtraces/registers/variables/arguments/local variables, moving up
> > > > > and
> > > > > down stack frames, can be used with any ppc64 vmcore, irrespective of
> > > > > being ELF format or kdump-compressed format.
> > > > >
> > > > > Note: This doesn't support live debugging on ppc64, since registers
> > > > > are not
> > > > > available to be read
> > > > >
> > > > > Implications on Architectures:
> > > > > ====================================
> > > > >
> > > > > No architecture other than PPC64 has been affected, other than in
> > > > > case of
> > > > > 'frame' command
> > > > >
> > > > > As mentioned in patch #2, since frame will not be prohibited, so it
> > > > > will print:
> > > > >
> > > > > crash> frame
> > > > > #0 <unavailable> in ?? ()
> > > > >
> > > > > Instead of before prohibited message:
> > > > >
> > > > > crash> frame
> > > > > crash: prohibited gdb command: frame
> > > > >
> > > > > Major change will be in 'gdb mode' on PPC64, that it will print the
> > > > > frames, and
> > > > > local variables, instead of failing with errors showing no frame, or
> > > > > showing
> > > > > that couldn't get PC, it will be able to give all this information.
> > > > >
> > > > > Testing:
> > > > > ========
> > > > >
> > > > > Git tree with this patch series applied:
> > > > > https://github.com/adi-g15-ibm/crash/tree/stack-unwind-v6
> > > > >
> > > > > To test various gdb passthroughs:
> > > > >
> > > > > (crash) set
> > > > > (crash) set gdb on
> > > > > gdb> thread
> > > > > gdb> bt
> > > > > gdb> info threads
> > > > > gdb> info threads
> > > > > gdb> info locals
> > > > > gdb> info variables irq_rover_lock
> > > > > gdb> info args
> > > > > gdb> thread 2
> > > > > gdb> set gdb off
> > > > > (crash) set
> > > > > (crash) set -c 6
> > > > > (crash) gdb thread
> > > > > (crash) bt
> > > > > (crash) gdb bt
> > > > > (crash) frame
> > > > > (crash) up
> > > > > (crash) down
> > > > > (crash) info locals
> > > > >
> > > > > Known Issues:
> > > > > =============
> > > > >
> > > > > 1. In gdb mode, 'bt' might fail to show backtrace in few vmcores
> > > > > collected
> > > > > from older kernels. This is a known issue due to register
> > > > > mismatch, and
> > > > > its fix has been merged upstream:
> > > > >
> > > > > This can also cause some 'invalid kernel virtual address' errors
> > > > > during gdb
> > > > > unwinding the stack registers
> > > > >
> > > > > Commit:
> > > > > https://github.com/torvalds/linux/commit/b684c09f09e7a6af3794d4233ef785819e72db79
> > > > >
> > > > > Fixing GDB passthroughs on other architectures
> > > > > ==============================================
> > > > >
> > > > > Much of the work for making gdb passthroughs like 'gdb bt', 'gdb
> > > > > thread', 'gdb info locals' etc. has been done by the patches
> > > > > introducing
> > > > > 'machdep->get_cpu_reg' and this series fixing some issues in that.
> > > > >
> > > > > Other architectures should be able to fix these gdb functionalities by
> > > > > simply implementing 'machdep->get_cpu_reg (cpu, regno, ...)'.
> > > > >
> > > > > The reasoning behind that has been explained with a diagram in commit
> > > > > description of patch #1
> > > > >
> > > > > I will assist with my findings/observations fixing it on ppc64
> > > > > whenever needed.
> > > > >
> > > > > Changelog:
> > > > > ==========
> > > > >
> > > > > V6:
> > > > > + changes in patch #5: fix bug introduced in v5 that caused initial
> > > > > gdb thread
> > > > > to be thread 1
> > > > >
> > > > > V5:
> > > > > + changes in patch #1: made ppc64_get_cpu_reg static, and remove
> > > > > unreachable
> > > > > code
> > > > > + changes in patch #3: fixed typo 'ppc64_renum' instead of
> > > > > 'ppc64_regnum',
> > > > > remove unneeded if condition
> > > > > + changes in patch #5: implement refresh regcache on per thread,
> > > > > instead of all
> > > > > threads at once
> > > > >
> > > > > V4:
> > > > > + fix segmentation fault in live debugging (change in patch #1)
> > > > > + mention live debugging not supported in cover letter and patch #1
> > > > > + fixed some checkpatch warnings (change in patch #5)
> > > > >
> > > > > V3:
> > > > > + default gdb thread will be the crashing thread, instead of being
> > > > > thread '0'
> > > > > + synchronise crash cpu and gdb thread context
> > > > > + fix bug in gdb_interface, that replaced gdb's output stream, losing
> > > > > output in some cases, such as info threads and extra output in info
> > > > > variables
> > > > > + fix 'info threads'
> > > > >
> > > > > RFC V2:
> > > > > - removed patch implementing 'frame', 'up', 'down' in crash
> > > > > - updated the cover letter by removing the mention of those
> > > > > commands other
> > > > > than the respective gdb passthrough
> > > > >
> > > > > Aditya Gupta (5):
> > > > > ppc64: correct gdb passthroughs by implementing
> > > > > machdep->get_cpu_reg
> > > > > remove 'frame' from prohibited commands list
> > > > > synchronise cpu context changes between crash/gdb
> > > > > fix gdb_interface: restore gdb's output streams at end of
> > > > > gdb_interface
> > > > > fix 'info threads' command
> > > > >
> > > > > crash_target.c | 44 ++++++++++++++++
> > > > > defs.h | 130
> > > > > +++++++++++++++++++++++++++++++++++++++++++++++-
> > > > > gdb-10.2.patch | 110 +++++++++++++++++++++++++++++++++++++++-
> > > > > gdb_interface.c | 2 +-
> > > > > kernel.c | 47 +++++++++++++++--
> > > > > ppc64.c | 95 +++++++++++++++++++++++++++++++++--
> > > > > task.c | 14 ++++++
> > > > > tools.c | 2 +-
> > > > > 8 files changed, 434 insertions(+), 10 deletions(-)
> > > > >
> > >
> >
--
Crash-utility mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://${domain_name}/admin/lists/devel.lists.crash-utility.osci.io/
Contribution Guidelines: https://github.com/crash-utility/crash/wiki