On Mon, Oct 21, 2019 at 6:45 AM Michal Simek <michal.si...@xilinx.com> wrote: > > On 21. 10. 19 10:45, Quanyang Wang wrote: > > Hi Michal, > > > > On 10/21/19 4:16 PM, Michal Simek wrote: > >> On 21. 10. 19 7:50, quanyang.w...@windriver.com wrote: > >>> From: Quanyang Wang <quanyang.w...@windriver.com> > >>> > >>> When run kdump with enabling CONFIG_DEBUG_PREEMPT, there is a calltrace > >>> as below: > >>> > >>> BUG: using smp_processor_id() in preemptible [00000000] code: sh/303 > >>> caller is machine_crash_shutdown+0x2c/0xe8 > >>> CPU: 0 PID: 303 Comm: sh Kdump: loaded Not tainted > >>> 5.2.20-yocto-standard #1 > >>> Hardware name: Xilinx Zynq Platform > >>> [<80112ff4>] (unwind_backtrace) from [<8010ca4c>] (show_stack+0x18/0x1c) > >>> [<8010ca4c>] (show_stack) from [<809b000c>] (dump_stack+0x70/0x8c) > >>> [<809b000c>] (dump_stack) from [<80549a14>] > >>> (debug_smp_processor_id+0xd4/0x118) > >>> [<80549a14>] (debug_smp_processor_id) from [<80111428>] > >>> (machine_crash_shutdown+0x2c/0xe8) > >>> [<80111428>] (machine_crash_shutdown) from [<801afe24>] > >>> (__crash_kexec+0x70/0xd0) > >>> [<801afe24>] (__crash_kexec) from [<801259b4>] (panic+0x110/0x324) > >>> [<801259b4>] (panic) from [<805f7018>] (sysrq_handle_crash+0x18/0x1c) > >>> [<805f7018>] (sysrq_handle_crash) from [<805f7584>] > >>> (__handle_sysrq+0x9c/0x14c) > >>> [<805f7584>] (__handle_sysrq) from [<805f79e8>] > >>> (write_sysrq_trigger+0x5c/0x6c) > >>> [<805f79e8>] (write_sysrq_trigger) from [<8031e850>] > >>> (proc_reg_write+0x78/0x8c) > >>> [<8031e850>] (proc_reg_write) from [<802b1b28>] (vfs_write+0xc0/0x154) > >>> [<802b1b28>] (vfs_write) from [<802b2a64>] (ksys_write+0x6c/0xd4) > >>> [<802b2a64>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) > >>> Exception stack(0xba157fa8 to 0xba157ff0) > >>> 7fa0: 00000002 005ab930 00000001 005ab930 00000002 00000000 > >>> 7fc0: 00000002 005ab930 76fa2290 00000004 76f3d124 76f3cc8c 00000000 > >>> 00000000 > >>> 7fe0: 00000004 7edec940 76edbfff 76e67d16 > >>> > >>> This is because that the function disable_nonboot_cpus is called in > >>> order to make sure that the crash kernel runs in the boot CPU(cpu0). > >>> And it will enable local irq by calling as below: > >>> > >>> disable_nonboot_cpus > >>> -> freeze_secondary_cpus > >>> -> _cpu_down > >>> -> percpu_down_write > >>> -> rcu_sync_enter > >>> -> spin_unlock_irq(&rsp->rss_lock) > >>> -> local_irq_enable() > >>> > >>> Then the functions including smp_processor_id() behind > >>> disable_nonboot_cpus > >>> will run at the irq-enabled context, and this will trigger the > >>> calltrace. > >>> > >>> So move disable_nonboot_cpus() in front of local_irq_disable() to avoid > >>> it since disable_nonboot_cpus() not need run at an atomic context. > >>> > >>> Signed-off-by: Quanyang Wang <quanyang.w...@windriver.com> > >>> --- > >>> arch/arm/kernel/machine_kexec.c | 3 ++- > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/arch/arm/kernel/machine_kexec.c > >>> b/arch/arm/kernel/machine_kexec.c > >>> index 654f2b1f9ac0..83d2025a4ab1 100644 > >>> --- a/arch/arm/kernel/machine_kexec.c > >>> +++ b/arch/arm/kernel/machine_kexec.c > >>> @@ -145,9 +145,10 @@ static void machine_kexec_mask_interrupts(void) > >>> void machine_crash_shutdown(struct pt_regs *regs) > >>> { > >>> - local_irq_disable(); > >>> disable_nonboot_cpus(); > >>> + local_irq_disable(); > >>> + > >>> crash_smp_send_stop(); > >>> crash_save_cpu(regs, smp_processor_id()); > >>> > >> ok. Can you please check before this if your usecases work without > >> disable_nonboot_cpus(). This patch was done pretty long time ago where > >> there was an issue with kexec. Long time ago I was talking to arm-soc > >> maintainers about this and they told me that mainline code should work > >> fine without any need to call disable_nonboot_cpus(). > >> It means if kexec is working fine we can revert origin patch and use > >> what mainline is using. > > > > It seems that the issue is still there. When crash at cpu1 and crash > > kernel runs at cpu1, > > > > it will hang, the log is as below: > > > > root@xilinx-zynq:~# sh 1.sh > > syscall kexec_file_load not available. > > sysrq: Trigger a crash > > Kernel panic - not syncing: sysrq triggered crash > > CPU: 1 PID: 308 Comm: sh Kdump: loaded Not tainted 5.2.20-yocto-standard #4 > > Hardware name: Xilinx Zynq Platform > > [<80112eb0>] (unwind_backtrace) from [<8010cc04>] (show_stack+0x18/0x1c) > > [<8010cc04>] (show_stack) from [<8094f8f4>] (dump_stack+0x70/0x8c) > > [<8094f8f4>] (dump_stack) from [<801256f4>] (panic+0xf8/0x320) > > [<801256f4>] (panic) from [<805dbeb0>] (sysrq_handle_crash+0x18/0x1c) > > [<805dbeb0>] (sysrq_handle_crash) from [<805dc3b8>] > > (__handle_sysrq+0x9c/0x148) > > [<805dc3b8>] (__handle_sysrq) from [<805dc804>] > > (write_sysrq_trigger+0x5c/0x6c) > > [<805dc804>] (write_sysrq_trigger) from [<8031b040>] > > (proc_reg_write+0x78/0x8c) > > [<8031b040>] (proc_reg_write) from [<802aeec4>] (vfs_write+0xc0/0x154) > > [<802aeec4>] (vfs_write) from [<802afd18>] (ksys_write+0x64/0xc8) > > [<802afd18>] (ksys_write) from [<80101000>] (ret_fast_syscall+0x0/0x54) > > Exception stack(0xb905bfa8 to 0xb905bff0) > > bfa0: 00000002 0059afa0 00000001 0059afa0 00000002 > > 00000000 > > bfc0: 00000002 0059afa0 76f8e290 00000004 76f29124 76f28c8c 00000000 > > 00000000 > > bfe0: 00000004 7eb858c0 76ec7fff 76e53d16 > > CPU 0 will stop doing anything useful since another CPU has crashed > > Loading crashdump kernel... > > Bye! > > Booting Linux on physical CPU 0x1 > > Linux version 5.2.20-yocto-standard (oe-user@oe-host) (gcc version 9.2.0 > > (GCC)) #1 SMP PREEMPT Thu Oct 17 08:15:14 UTC 2019 > > CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=18c5387d > > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache > > OF: fdt: Machine model: Xilinx ZC706 board > > OF: fdt: Ignoring memory range 0x0 - 0x8000000 > > printk: debug: ignoring loglevel setting. > > printk: bootconsole [earlycon0] enabled > > Memory policy: Data cache writealloc > > cma: Reserved 16 MiB at 0x16c00000 > > On node 0 totalpages: 65280 > > Normal zone: 574 pages used for memmap > > Normal zone: 0 pages reserved > > Normal zone: 65280 pages, LIFO batch:15 > > percpu: Embedded 19 pages/cpu s47756 r8192 d21876 u77824 > > pcpu-alloc: s47756 r8192 d21876 u77824 alloc=19*4096 > > pcpu-alloc: [0] 0 [0] 1 > > Built 1 zonelists, mobility grouping on. Total pages: 64706 > > Kernel command line: console=ttyPS0,115200n8 root=/dev/nfs rw > > nfsroot=128.224.165.20:/export/pxeboot/vlm-boards/22009/rootfs,v3,tcp > > ip=128.224.179.217:128.224.165.20:128.224.178.1:255.255.254.0:zc702:eth0:off > > ignore_loglevel earlyprintk noinitrd selinux=0 enforcing=0 kmemleak=on > > elfcorehdr=0x17f00000 mem=261120K > > Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) > > Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) > > Memory: 227332K/261120K available (9216K kernel code, 725K rwdata, 2284K > > rodata, 1024K init, 567K bss, 17404K reserved, 16384K cma-reserved) > > SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 > > ftrace: allocating 35203 entries in 69 pages > > rcu: Preemptible hierarchical RCU implementation. > > rcu: RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2. > > Tasks RCU enabled. > > rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. > > rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 > > NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16 > > efuse mapped to (ptrval) > > slcr mapped to (ptrval) > > L2C: platform provided aux values match the hardware, so have no > > effect. Please remove them. > > L2C-310 erratum 769419 enabled > > L2C-310 enabling early BRESP for Cortex-A9 > > L2C-310: enabling full line of zeros but not enabled in Cortex-A9 > > L2C-310 ID prefetch enabled, offset 1 lines > > L2C-310 dynamic clock gating enabled, standby mode enabled > > L2C-310 cache controller enabled, 8 ways, 512 kB > > L2C-310: CACHE_ID 0x410000c8, AUX_CTRL 0x76760001 > > random: get_random_bytes called from start_kernel+0x2b0/0x4c4 with > > crng_init=0 > > zynq_clock_init: clkc starts at (ptrval) > > Zynq clock init > > sched_clock: 64 bits at 333MHz, resolution 3ns, wraps every 4398046511103ns > > clocksource: arm_global_timer: mask: 0xffffffffffffffff max_cycles: > > 0x4ce07af025, max_idle_ns: 440795209040 ns > > Switching to timer-based delay loop, resolution 3ns > > clocksource: ttc_clocksource: mask: 0xffff max_cycles: 0xffff, > > max_idle_ns: 537538477 ns > > timer #0 at (ptrval), irq=17 > > Console: colour dummy device 80x30 > > Calibrating delay loop (skipped), value calculated using timer > > frequency.. 666.66 BogoMIPS (lpj=3333333) > > pid_max: default: 32768 minimum: 301 > > LSM: Security Framework initializing > > Mount-cache hash table entries: 1024 (order: 0, 4096 bytes) > > Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes) > > CPU: Testing write buffer coherency: ok > > CPU0: Spectre v2: using BPIALL workaround > > CPU0: thread -1, cpu 1, socket 0, mpidr 80000001 > > Setting up static identity map for 0x8100000 - 0x8100060 > > rcu: Hierarchical SRCU implementation. > > smp: Bringing up secondary CPUs ... > > ok. Can you send content of your 1.sh script? > > Anyway the patch looks good to me. > Bruce: Feel free to take it. I will add it to Xilinx tree too.
Ack'd. Will pull it into my queue. Bruce > > Thanks, > Michal > > -- - Thou shalt not follow the NULL pointer, for chaos and madness await thee at its end - "Use the force Harry" - Gandalf, Star Trek II -- _______________________________________________ linux-yocto mailing list linux-yocto@yoctoproject.org https://lists.yoctoproject.org/listinfo/linux-yocto