Re: [PATCH 0/7] Kexec-tools: Improve RISC-V port
在 2023/9/20 19:56, Simon Horman 写道: On Fri, Sep 15, 2023 at 11:50:06AM +0800, Song Shuai wrote: Hi, This series is created to improve RISC-V port of kexec-tools, and is based on the horms/kexec-tools:build-test-riscv-v2 branch. In my mind the big question is how to move RISC-V support from that branch, to being merged into main. IIRC there were some issues that needed to be addressed. Perhaps they are all addressed by this series, and with some appropriate squashing we can move forwards with a series based on main? Hi, Simon and Nick: I squashed the first four patches as a "RISC-V: Some fixes for riscv port" patch and then took the horms/main as the base to collect the 2 patches from horms/build-test-riscv-v2 branch and this series togother. These are the Github link and all commits for RISC-V. https://github.com/sugarfillet/kexec-tools/commits/main_rv 5dc133e RISC-V: Support loading Image binary file b042f6d RISC-V: Separate elf_riscv_find_pbase out 8f344c7 RISC-V: Enable kexec_file_load syscall 7d4b982 RISC-V: Some fixes for riscv port 3205c1c local: RISC-V: distribute purgatory/riscv/Makefile 54f9daf RISC-V: Add support for riscv kexec/kdump on kexec-tools Since I didn't found the issues/fixes as Nick mentioned with these commits, I prefer to merge them into horms/main and let more kexec/kdump users to help improve/fixup RISC-V port. I would like to listen to your advice. For your convenience, here is my Github branch for kexec-tools: https://github.com/sugarfillet/kexec-tools/commits/rv-Image The first four patches fixes some build or runtime issues: RISC-V: Use linux,usable-memory-range for crash kernel RISC-V: Fix the undeclared ‘EM_RISCV’ build failure RISC-V: Get memory ranges from iomem RISC-V: Correct the usage of command line option The last three patches enable the kexec_file_load syscall to load vmlinux and support loading Image binary file for two syscalls. RISC-V: Enable kexe_file_load RISC-V: Separate elf_riscv_find_pbase out RISC-V: Support loading Image binary file Note that: RISC-V Linux kexec_load_file's support for Image file has been sent out but not merged [1]. [1]: https://lore.kernel.org/linux-riscv/20230914020044.1397356-1-songshuaish...@tinylab.org/T/#t Li Zhengyu (1): RISC-V: Enable kexe_file_load Song Shuai (6): RISC-V: Use linux,usable-memory-range for crash kernel RISC-V: Fix the undeclared ‘EM_RISCV’ build failure RISC-V: Get memory ranges from iomem RISC-V: Correct the usage of command line option RISC-V: Separate elf_riscv_find_pbase out RISC-V: Support loading Image binary file kexec/arch/riscv/Makefile| 2 + kexec/arch/riscv/crashdump-riscv.c | 2 +- kexec/arch/riscv/image-header.h | 88 ++ kexec/arch/riscv/iomem.h | 10 ++ kexec/arch/riscv/kexec-elf-riscv.c | 77 +--- kexec/arch/riscv/kexec-image-riscv.c | 95 +++ kexec/arch/riscv/kexec-riscv.c | 176 ++- kexec/arch/riscv/kexec-riscv.h | 21 kexec/kexec-syscall.h| 3 + 9 files changed, 368 insertions(+), 106 deletions(-) create mode 100644 kexec/arch/riscv/image-header.h create mode 100644 kexec/arch/riscv/iomem.h create mode 100644 kexec/arch/riscv/kexec-image-riscv.c -- 2.20.1 -- Thanks Song Shuai ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] kexec/loongarch64: fix 'make dist' file loss issue
The Makefile omits the iomem.h file, causing the archive file generated by 'make dist' to lose iomem.h. This patch is used to fix this problem. Signed-off-by: Ming Wang --- kexec/arch/loongarch/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile index cee7e56..f91d0ba 100644 --- a/kexec/arch/loongarch/Makefile +++ b/kexec/arch/loongarch/Makefile @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS = dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS) \ kexec/arch/loongarch/kexec-loongarch.h \ kexec/arch/loongarch/image-header.h \ + kexec/arch/loongarch/iomem.h \ kexec/arch/loongarch/crashdump-loongarch.h \ kexec/arch/loongarch/include/arch/options.h base-commit: 6419b008fde783fd0cc2cc266bd1c9cf35e99a0e -- 2.39.2 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.
Hi, Simon On 10/10/23 21:01, Simon Horman wrote: > On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote: >> Hi, maintainers, >> >> >> I get the kexec-tools 2.0.27 from >> http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, >> >> But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file >> was missing from >> >> this archive. >> >> >> This causes build errors in many distributions, like debian. The error >> message is as follows, >> >> make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] >> Error 1 >> kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such >> file or directory >>27 | #include "iomem.h" >> >> See also: >> https://buildd.debian.org/status/package.php?p=kexec-tools=sid >> >> >> Can this archive be repaired and updated? >> >> >> Thanks, Ming > Hi, > > I need to think about how to deal with this from a release PoV. > But can you check if the patch below resolves your problem? > > diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile > index cee7e569a2a2..f91d0baf049a 100644 > --- a/kexec/arch/loongarch/Makefile > +++ b/kexec/arch/loongarch/Makefile > @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS = > dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS) > \ > kexec/arch/loongarch/kexec-loongarch.h > \ > kexec/arch/loongarch/image-header.h > \ > + kexec/arch/loongarch/iomem.h > \ > kexec/arch/loongarch/crashdump-loongarch.h > \ > kexec/arch/loongarch/include/arch/options.h Add this patch and then make dist, it's OK. Sorry, I was stupid. This can fix the problem of missing files. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv8 2/5] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
On Tue, Oct 10, 2023 at 04:07:00PM +0530, Hari Bathini wrote: > > > On 09/10/23 5:00 pm, Pingfan Liu wrote: > > *** Idea *** > > For kexec -p, the boot cpu can be not the cpu0, this causes the problem > > of allocating memory for paca_ptrs[]. However, in theory, there is no > > requirement to assign cpu's logical id as its present sequence in the > > device tree. But there is something like cpu_first_thread_sibling(), > > which makes assumption on the mapping inside a core. Hence partially > > loosening the mapping, i.e. unbind the mapping of core while keep the > > mapping inside a core. > > > > *** Implement *** > > At this early stage, there are plenty of memory to utilize. Hence, this > > patch allocates interim memory to link the cpu info on a list, then > > reorder cpus by changing the list head. As a result, there is a rotate > > shift between the sequence number in dt and the cpu logical number. > > > > *** Result *** > > After this patch, a boot-cpu's logical id will always be mapped into the > > range [0,threads_per_core). > > > > Besides this, at this phase, all threads in the boot core are forced to > > be onlined. This restriction will be lifted in a later patch with > > extra effort. > > > > Signed-off-by: Pingfan Liu > > Cc: Michael Ellerman > > Cc: Nicholas Piggin > > Cc: Christophe Leroy > > Cc: Mahesh Salgaonkar > > Cc: Wen Xiong > > Cc: Baoquan He > > Cc: Ming Lei > > Cc: kexec@lists.infradead.org > > To: linuxppc-...@lists.ozlabs.org > > --- > > arch/powerpc/kernel/prom.c | 25 + > > arch/powerpc/kernel/setup-common.c | 87 +++--- > > 2 files changed, 85 insertions(+), 27 deletions(-) > > > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c > > index ec82f5bda908..87272a2d8c10 100644 > > --- a/arch/powerpc/kernel/prom.c > > +++ b/arch/powerpc/kernel/prom.c > > @@ -76,7 +76,9 @@ u64 ppc64_rma_size; > > unsigned int boot_cpu_node_count __ro_after_init; > > #endif > > static phys_addr_t first_memblock_size; > > +#ifdef CONFIG_SMP > > static int __initdata boot_cpu_count; > > +#endif > > static int __init early_parse_mem(char *p) > > { > > @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long > > node, > > const __be32 *intserv; > > int i, nthreads; > > int len; > > - int found = -1; > > - int found_thread = 0; > > + bool found = false; > > /* We are scanning "cpu" nodes only */ > > if (type == NULL || strcmp(type, "cpu") != 0) > > @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned > > long node, > > for (i = 0; i < nthreads; i++) { > > if (be32_to_cpu(intserv[i]) == > > fdt_boot_cpuid_phys(initial_boot_params)) { > > - found = boot_cpu_count; > > - found_thread = i; > > + /* > > +* always map the boot-cpu logical id into the > > +* range of [0, thread_per_core) > > +*/ > > + boot_cpuid = i; > > + found = true; > > + /* This works around the hole in paca_ptrs[]. */ > > + if (nr_cpu_ids < nthreads) > > + set_nr_cpu_ids(nthreads); > > } > > #ifdef CONFIG_SMP > > /* logical cpu id is always 0 on UP kernels */ > > @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned > > long node, > > } > > /* Not the boot CPU */ > > - if (found < 0) > > + if (!found) > > return 0; > > - DBG("boot cpu: logical %d physical %d\n", found, > > - be32_to_cpu(intserv[found_thread])); > > - boot_cpuid = found; > > + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, > > + be32_to_cpu(intserv[boot_cpuid])); > > - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); > > + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); > > /* > > * PAPR defines "logical" PVR values for cpus that > > diff --git a/arch/powerpc/kernel/setup-common.c > > b/arch/powerpc/kernel/setup-common.c > > index 1b19a9815672..81291e13dec0 100644 > > --- a/arch/powerpc/kernel/setup-common.c > > +++ b/arch/powerpc/kernel/setup-common.c > > @@ -36,6 +36,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) > > u32 *cpu_to_phys_id = NULL; > > +struct interrupt_server_node { > > + struct list_head node; > > + boolavail; > > + int len; > > + __be32 *intserv; > > +}; > > + > > /** > >* setup_cpu_maps - initialize the following cpu maps: > >* cpu_possible_mask > > @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; > > void __init smp_setup_cpu_maps(void) > > { > > struct device_node *dn; > > - int cpu = 0; > > - int nthreads = 1; > > + int shift = 0, cpu =
Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
On Tue, Oct 10, 2023 at 01:56:13PM +0530, Hari Bathini wrote: > > > On 09/10/23 5:00 pm, Pingfan Liu wrote: > > If the boot_cpuid is smaller than nr_cpus, it requires extra effort to > > ensure the boot_cpu is in cpu_present_mask. This can be achieved by > > reserving the last quota for the boot cpu. > > > > Note: the restriction on nr_cpus will be lifted with more effort in the > > successive patches > > > > Signed-off-by: Pingfan Liu > > Cc: Michael Ellerman > > Cc: Nicholas Piggin > > Cc: Christophe Leroy > > Cc: Mahesh Salgaonkar > > Cc: Wen Xiong > > Cc: Baoquan He > > Cc: Ming Lei > > Cc: kexec@lists.infradead.org > > To: linuxppc-...@lists.ozlabs.org > > --- > > arch/powerpc/kernel/setup-common.c | 25 ++--- > > 1 file changed, 22 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/kernel/setup-common.c > > b/arch/powerpc/kernel/setup-common.c > > index 81291e13dec0..f9ef0a2666b0 100644 > > --- a/arch/powerpc/kernel/setup-common.c > > +++ b/arch/powerpc/kernel/setup-common.c > > @@ -454,8 +454,8 @@ struct interrupt_server_node { > > void __init smp_setup_cpu_maps(void) > > { > > struct device_node *dn; > > - int shift = 0, cpu = 0; > > - int j, nthreads = 1; > > + int terminate, shift = 0, cpu = 0; > > + int j, bt_thread = 0, nthreads = 1; > > int len; > > struct interrupt_server_node *intserv_node, *n; > > struct list_head *bt_node, head; > > @@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void) > > for (j = 0 ; j < nthreads; j++) { > > if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { > > bt_node = _node->node; > > + bt_thread = j; > > found_boot_cpu = true; > > /* > > * Record the round-shift between dt > > @@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void) > > /* Select the primary thread, the boot cpu's slibing, as the logic 0 */ > > list_add_tail(, bt_node); > > pr_info("the round shift between dt seq and the cpu logic number: > > %d\n", shift); > > + terminate = nr_cpu_ids; > > list_for_each_entry(intserv_node, , node) { > > + j = 0; > > > + /* Choose a start point to cover the boot cpu */ > > + if (nr_cpu_ids - 1 < bt_thread) { > > + /* > > +* The processor core puts assumption on the thread id, > > +* not to breach the assumption. > > +*/ > > + terminate = nr_cpu_ids - 1; > > nthreads is anyway assumed to be same for all cores. So, enforcing > nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should > make the code much simpler without the need for above check and the > other complexities addressed in the subsequent patches... > Indeed, this series can be splited into two partsk, [1-2/5] and [3-5/5]. In [1-2/5], if smaller, the nr_cpu_ids is enforced to be equal to nthreads. I will make it align upward on nthreads in the next version. So [1-2/5] can be totally independent from the rest patches in this series. >From an engineer's perspective, [3-5/5] are added to maintain the nr_cpus semantics. (Finally, nr_cpus=1 can be achieved but requiring effort on other subsystem) Testing result on my Power9 machine with SMT=4 -1. taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger' kdump:/# cat /proc/meminfo | grep Percpu Percpu: 896 kB kdump:/# cat /sys/devices/system/cpu/possible 0 -2. taskset -c 5 bash -c 'echo c > /proc/sysrq-trigger' kdump:/# cat /proc/meminfo | grep Percpu Percpu: 1792 kB kdump:/# cat /sys/devices/system/cpu/possible 0-1 -3. taskset -c 6 bash -c 'echo c > /proc/sysrq-trigger' kdump:/# cat /proc/meminfo | grep Percpu Percpu: 1792 kB kdump:/# cat /sys/devices/system/cpu/possible 0,2 -4. taskset -c 7 bash -c 'echo c > /proc/sysrq-trigger' kdump:/# cat /proc/meminfo | grep Percpu Percpu: 1792 kB kdump:/# cat /sys/devices/system/cpu/possible 0,3 Thanks, Pingfan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.
Hi, Simon Thank you for your reply. On 10/10/23 21:01, Simon Horman wrote: > On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote: >> Hi, maintainers, >> >> >> I get the kexec-tools 2.0.27 from >> http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, >> >> But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file >> was missing from >> >> this archive. >> >> >> This causes build errors in many distributions, like debian. The error >> message is as follows, >> >> make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] >> Error 1 >> kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such >> file or directory >>27 | #include "iomem.h" >> >> See also: >> https://buildd.debian.org/status/package.php?p=kexec-tools=sid >> >> >> Can this archive be repaired and updated? >> >> >> Thanks, Ming > Hi, > > I need to think about how to deal with this from a release PoV. > But can you check if the patch below resolves your problem? Thanks for the patch, I think this patch is necessary. But it doesn't solve my problem. My purpose is to port the loongarch architecture kexec-tools tool to Debian. However,debian community's automatic build system pulled the source from http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz So, my problem may still require updating the archive. But I can wait for version 2.0.8 then continuing the debian porting work. Should this problem be fixed in 2.0.28? > > diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile > index cee7e569a2a2..f91d0baf049a 100644 > --- a/kexec/arch/loongarch/Makefile > +++ b/kexec/arch/loongarch/Makefile > @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS = > dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS) > \ > kexec/arch/loongarch/kexec-loongarch.h > \ > kexec/arch/loongarch/image-header.h > \ > + kexec/arch/loongarch/iomem.h > \ > kexec/arch/loongarch/crashdump-loongarch.h > \ > kexec/arch/loongarch/include/arch/options.h ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32
On Tue, Oct 10, 2023 at 02:38:40PM +0530, Sourabh Jain wrote: > Hello Pingfan, > > > > > With this patch series applied, the kdump kernel fails to boot on > > powerpc with nr_cpus=1. > > > > Console logs: > > --- > > [root]# echo c > /proc/sysrq-trigger > > [ 74.783235] sysrq: Trigger a crash > > [ 74.783244] Kernel panic - not syncing: sysrq triggered crash > > [ 74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted > > 6.6.0-rc5pf-nr-cpus+ #3 > > [ 74.783259] Hardware name: POWER10 (raw) phyp pSeries > > [ 74.783275] Call Trace: > > [ 74.783280] [c0020f4ebac0] [c0ed9f38] > > dump_stack_lvl+0x6c/0x9c (unreliable) > > [ 74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438 > > [ 74.783298] [c0020f4ebb90] [c0936d48] > > sysrq_handle_crash+0x28/0x30 > > [ 74.783304] [c0020f4ebbf0] [c093773c] > > __handle_sysrq+0x10c/0x250 > > [ 74.783309] [c0020f4ebc90] [c0937fa8] > > write_sysrq_trigger+0xc8/0x168 > > [ 74.783314] [c0020f4ebcd0] [c0665d8c] > > proc_reg_write+0x10c/0x1b0 > > [ 74.783321] [c0020f4ebd00] [c058da54] > > vfs_write+0x104/0x4b0 > > [ 74.783326] [c0020f4ebdc0] [c058dfdc] > > ksys_write+0x7c/0x140 > > [ 74.783331] [c0020f4ebe10] [c0033a64] > > system_call_exception+0x144/0x3a0 > > [ 74.783337] [c0020f4ebe50] [c000c554] > > system_call_common+0xf4/0x258 > > [ 74.783343] --- interrupt: c00 at 0x7fffa0721594 > > [ 74.783352] NIP: 7fffa0721594 LR: 7fffa0697bf4 CTR: > > > > [ 74.783364] REGS: c0020f4ebe80 TRAP: 0c00 Not tainted > > (6.6.0-rc5pf-nr-cpus+) > > [ 74.783376] MSR: 8280f033 > > CR: 2802 XER: > > [ 74.783394] IRQMASK: 0 > > [ 74.783394] GPR00: 0004 7c4b6800 7fffa0807300 > > 0001 > > [ 74.783394] GPR04: 00013549ea60 0002 0010 > > > > [ 74.783394] GPR08: > > > > [ 74.783394] GPR12: 7fffa0abaf70 4000 > > 00011a0f9798 > > [ 74.783394] GPR16: 00011a0f9724 00011a097688 00011a02ff70 > > 00011a0fd568 > > [ 74.783394] GPR20: 000135554bf0 0001 00011a0aa478 > > 7c4b6a24 > > [ 74.783394] GPR24: 7c4b6a20 00011a0faf94 0002 > > 00013549ea60 > > [ 74.783394] GPR28: 0002 7fffa08017a0 00013549ea60 > > 0002 > > [ 74.783440] NIP [7fffa0721594] 0x7fffa0721594 > > [ 74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4 > > [ 74.783447] --- interrupt: c00 > > I'm in purgatory > > [ 0.00] radix-mmu: Page sizes from device-tree: > > [ 0.00] radix-mmu: Page size shift = 12 AP=0x0 > > [ 0.00] radix-mmu: Page size shift = 16 AP=0x5 > > [ 0.00] radix-mmu: Page size shift = 21 AP=0x1 > > [ 0.00] radix-mmu: Page size shift = 30 AP=0x2 > > [ 0.00] Activating Kernel Userspace Access Prevention > > [ 0.00] Activating Kernel Userspace Execution Prevention > > [ 0.00] radix-mmu: Mapped 0x-0x0001 > > with 64.0 KiB pages (exec) > > [ 0.00] radix-mmu: Mapped 0x0001-0x0020 > > with 64.0 KiB pages > > [ 0.00] radix-mmu: Mapped 0x0020-0x2000 > > with 2.00 MiB pages > > [ 0.00] radix-mmu: Mapped 0x2000-0x2260 > > with 2.00 MiB pages (exec) > > [ 0.00] radix-mmu: Mapped 0x2260-0x4000 > > with 2.00 MiB pages > > [ 0.00] radix-mmu: Mapped 0x4000-0x00018000 > > with 1.00 GiB pages > > [ 0.00] radix-mmu: Mapped 0x00018000-0x0001a000 > > with 2.00 MiB pages > > [ 0.00] lpar: Using radix MMU under hypervisor > > [ 0.00] Linux version 6.6.0-rc5pf-nr-cpus+ > > (r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red > > Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct 9 11:07: > > 41 CDT 2023 > > [ 0.00] Found initrd at 0xc00022e6:0xc000248f08d8 > > [ 0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 > > 0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries > > [ 0.00] printk: bootconsole [udbg0] enabled > > [ 0.00] the round shift between dt seq and the cpu logic number: > > 56 > > [ 0.00] BUG: Unable to handle kernel data access on write at > > 0xc001a000 > > [ 0.00] Faulting instruction address: 0xc00022009c64 > > [ 0.00] Oops: Kernel access of bad area, sig: 11 [#1] > > [ 0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries > > [ 0.00] Modules linked in: > > [ 0.00] CPU: 2 PID: 0 Comm: swapper Not tainted > > 6.6.0-rc5pf-nr-cpus+ #3 > > [ 0.00] Hardware name: POWER10 (raw)
Re: [PATCH makedumpfile V2 0/2] Add riscv64 support for makedumpfile
On 2023/10/10 23:12, Song Shuai wrote: > Changes since V1: > https://lore.kernel.org/kexec/20230927111822.180630-1-songshuaish...@tinylab.org/ > > - fix a typo in Patch2's commit-msg > - adjust some indentions of Patch1 Thank you, but already applied the v1 patches with fixes on my end: https://github.com/makedumpfile/makedumpfile/compare/a34f017...aee7f3b I should have sent this link, sorry about that. Thanks, Kazu > > > These 2 patches add riscv64 support for makedumpfile: > > Patch1 - Add riscv64 support > === > > This patch adds support for riscv64 in makedumpfile. > It implements the "vtop" for kenrel memory regions > and supports Sv39/Sv48/Sv57 page modes for RV64. > > > Patch2 - riscv64: Correct the pfn_start for flatmem > == > > This patch temporarily fixes a issue of the tests about FLATMEM, > as the commit-msg says: > > To let info->max_mapnr indicate the direct max PFN and then > make the kdump header's max_mapnr_64 correct, riscv64 port > didn't define ARCH_PFN_OFFSET. > > As for FLATMEM type, the pfn region of mem_map_data should > be adjusted to start from info->phys_base instead of zero. > > > Tests > = > > With these 2 patches, the following tests had passed in RV64 Qemu virt > machine: > > Preparation: > --- > > 1. build kernel with FLATMEM and SPARSE memory models > 2. boot kernel with 3 different page-modes by setting nov4l/nov5l in cmdline > 3. panic kernel > > Tests: > - > > 1. create kdump-compressed file via this command > - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1` > - or with `--vtop` option to translate some typical addresses (like: > kernel_link_addr, vmalloc_start, page_offset) > > 2. start crash with kdump file and do some VTOPs > > > A test log: > --- > > # With the Sv57 and SPARSE_EXTREME kernel > # vtop the vmalloc start address -- 0xff20 > > > # /mnt/mkdf_f --vtop 0xff20 -d31 -f --non-mmap -c /proc/vmcore > /mnt/dump.file1 > > Translating virtual address ff20 to physical address. > VIRTUAL PHYSICAL > ff20 80087000 > > Copying data : [100.0 %] | > eta: 0s > > The dumpfile is saved to /mnt/dump.file1. > > makedumpfile Completed. > > # sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux > /tmp/hello/dump.file1 > ... >KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux > DUMPFILE: /tmp/hello/dump.file1 [PARTIAL DUMP] > CPUS: 2 > DATE: Wed Sep 27 18:37:45 CST 2023 >UPTIME: 00:00:18 > LOAD AVERAGE: 0.00, 0.00, 0.00 > TASKS: 55 > NODENAME: (none) > RELEASE: 6.6.0-rc1-7-g22bfc766389c > VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023 > MACHINE: riscv64 (unknown Mhz) >MEMORY: 511.8 MB > PANIC: "Kernel panic - not syncing: sysrq triggered crash" > PID: 1 > COMMAND: "sh" > TASK: ff6e [THREAD_INFO: ff6e] > CPU: 1 > STATE: TASK_RUNNING (PANIC) > > crash> vtop 0xff20 > VIRTUAL PHYSICAL > ff20 80087000 > >PGD: 814fa900 => 20010c01 >P4D: 80043000 => 20025401 >PUD: 80095000 => 20025801 >PMD: 80096000 => 20026001 >PTE: 80098000 => 20021ce7 > PAGE: 80087000 > >PTE PHYSICAL FLAGS > 20021ce7 80087000 (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY) > >PAGE PHYSICAL MAPPING INDEX CNT FLAGS > ff1c020021c0 8008700000 1 0 // same as the > makedumpfile's vtop > > Song Shuai (2): >Add riscv64 support >riscv64: Correct the pfn_start for flatmem > > Makefile | 2 +- > arch/riscv64.c | 219 + > makedumpfile.c | 18 > makedumpfile.h | 107 > 4 files changed, 345 insertions(+), 1 deletion(-) > create mode 100644 arch/riscv64.c > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Fix reboot race during device_shutdown()
Joel Fernandes writes: > On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman > wrote: >> >> Joel Fernandes writes: >> >> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes >> > wrote: >> > [..] >> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and >> >> > > kexec_image->preserve_context is true. However, doing it if either of >> >> > > these are >> >> > > not true prevents crashes/races. >> >> > >> >> > The KEXEC_JUMP case is something else entirely. It is supposed to work >> >> > like suspend to RAM. Maybe reboot should as well, but I am >> >> > uncomfortable making a generic device fix kexec specific. >> >> >> >> I see your point of view. I think regular reboot should also be fixed >> >> to avoid similar crash possibilities. I am happy to make a change for >> >> that similar to this patch if we want to proceed that way. >> >> >> >> Thoughts? >> > >> > Just checking how we want to proceed, is the consensus that we should >> > prevent kernel crashes without relying on userspace stopping all >> > processes? Should we fix regular reboot syscall as well and not just >> > kexec reboot? >> >> It just occurred to me there is something very fishy about all of this. >> >> What userspace do you have using kexec (not kexec on panic) that doesn't >> preform the same userspace shutdown as a normal reboot? >> >> Quite frankly such a userspace is buggy, and arguably that is where you >> should start fixing things. > > It is a simple unit test that tests kexec support by kexec-rebooting > the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is > ideal because in a real panic-on-kexec type crash, that may not happen > and so does not emulate the real world that well. I think we want the > kexec-reboot to do a *reboot* without crashing the kernel while doing > so. Ricardo/Steve can chime on what they feel as well. This is confusing. You have a unit test that, that tests kexec on panic using a the full kexec reboot. The two are fundamentally similar but you aren't going to have a valid test case if you mix them. There is a whole kernel module that tests more interesting cases, for the simple case you probably just want to do: echo 'p' > /proc/sysrq-trigger At least I think it is p that causes a kernel-panic. That will ensure you are exercising the kexec on panic code path. That performs the minimal shutdown in the kernel. >> That way you can get the orderly shutdown >> of userspace daemons/services along with an orderly shutdown of >> everything the kernel is responsible for. > > Fixing in userspace is an option but people are not happy that the > kernel can crash like that. In a kexec on panic scenario the kernel needs to perform that absolute bare essential shutdown before calling kexec (basically nothing). During kexec-on-panic nothing can be relied upon because we don't know what is broken. If that is what you care about (as suggested by the unit test) you need to fix the device initialization. In a normal kexec scenario the whole normal reboot process is expected. I have no problems with fixing the kernel to handle that scenario, but in the real world the entire orderly shutdown both, kernel and userspace should be performed. >> At the kernel level a kexec reboot and a normal reboot have been >> deliberately kept as close as possible. Which is why I say we should >> fix it in reboot. > > You mean fix it in userspace? No. I mean in the kernel the orderly shutdown for a kexec reboot and an ordinary reboot are kept as close to the same as possible. It should be the case that the only differences between the two is that in once case system firmware takes over after the orderly shutdown, and in the other case a new kernel takes over after the orderly shutdown. Eric ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Fix reboot race during device_shutdown()
On Mon, Oct 9, 2023 at 10:00 AM Steven Rostedt wrote: > > On Sat, 7 Oct 2023 21:30:42 -0400 > Joel Fernandes wrote: > > > Just checking how we want to proceed, is the consensus that we should > > prevent kernel crashes without relying on userspace stopping all > > processes? Should we fix regular reboot syscall as well and not just > > kexec reboot? > > If you can show that we can trigger the crash on normal reboot, then I > don't see why not. That is, if you have a program that does the reboot > (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's > a legitimate reason to fix it on normal reboot too. Ok, Sounds good, thanks for sharing your thoughts. - Joel ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: Fix reboot race during device_shutdown()
On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman wrote: > > Joel Fernandes writes: > > > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes > > wrote: > > [..] > >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and > >> > > kexec_image->preserve_context is true. However, doing it if either of > >> > > these are > >> > > not true prevents crashes/races. > >> > > >> > The KEXEC_JUMP case is something else entirely. It is supposed to work > >> > like suspend to RAM. Maybe reboot should as well, but I am > >> > uncomfortable making a generic device fix kexec specific. > >> > >> I see your point of view. I think regular reboot should also be fixed > >> to avoid similar crash possibilities. I am happy to make a change for > >> that similar to this patch if we want to proceed that way. > >> > >> Thoughts? > > > > Just checking how we want to proceed, is the consensus that we should > > prevent kernel crashes without relying on userspace stopping all > > processes? Should we fix regular reboot syscall as well and not just > > kexec reboot? > > It just occurred to me there is something very fishy about all of this. > > What userspace do you have using kexec (not kexec on panic) that doesn't > preform the same userspace shutdown as a normal reboot? > > Quite frankly such a userspace is buggy, and arguably that is where you > should start fixing things. It is a simple unit test that tests kexec support by kexec-rebooting the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is ideal because in a real panic-on-kexec type crash, that may not happen and so does not emulate the real world that well. I think we want the kexec-reboot to do a *reboot* without crashing the kernel while doing so. Ricardo/Steve can chime on what they feel as well. > That way you can get the orderly shutdown > of userspace daemons/services along with an orderly shutdown of > everything the kernel is responsible for. Fixing in userspace is an option but people are not happy that the kernel can crash like that. > At the kernel level a kexec reboot and a normal reboot have been > deliberately kept as close as possible. Which is why I say we should > fix it in reboot. You mean fix it in userspace? thanks, - Joel ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH makedumpfile V2 2/2] riscv64: Correct the pfn_start for flatmem
To let info->max_mapnr indicate the direct max PFN and then make the kdump header's max_mapnr_64 correct, riscv64 port didn't define ARCH_PFN_OFFSET. As for FLATMEM type, the pfn region of mem_map_data should be adjusted to start from info->phys_base instead of zero. Signed-off-by: Song Shuai --- makedumpfile.c | 4 1 file changed, 4 insertions(+) diff --git a/makedumpfile.c b/makedumpfile.c index 42d5565..3705bdd 100644 --- a/makedumpfile.c +++ b/makedumpfile.c @@ -3302,7 +3302,11 @@ get_mm_flatmem(void) if (is_xen_memory()) dump_mem_map(0, info->dom0_mapnr, mem_map, 0); else +#ifdef __riscv64__ + dump_mem_map((info->phys_base >> PAGESHIFT()), info->max_mapnr, mem_map, 0); +#else dump_mem_map(0, info->max_mapnr, mem_map, 0); +#endif return TRUE; } -- 2.20.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH makedumpfile V2 0/2] Add riscv64 support for makedumpfile
Changes since V1: https://lore.kernel.org/kexec/20230927111822.180630-1-songshuaish...@tinylab.org/ - fix a typo in Patch2's commit-msg - adjust some indentions of Patch1 These 2 patches add riscv64 support for makedumpfile: Patch1 - Add riscv64 support === This patch adds support for riscv64 in makedumpfile. It implements the "vtop" for kenrel memory regions and supports Sv39/Sv48/Sv57 page modes for RV64. Patch2 - riscv64: Correct the pfn_start for flatmem == This patch temporarily fixes a issue of the tests about FLATMEM, as the commit-msg says: To let info->max_mapnr indicate the direct max PFN and then make the kdump header's max_mapnr_64 correct, riscv64 port didn't define ARCH_PFN_OFFSET. As for FLATMEM type, the pfn region of mem_map_data should be adjusted to start from info->phys_base instead of zero. Tests = With these 2 patches, the following tests had passed in RV64 Qemu virt machine: Preparation: --- 1. build kernel with FLATMEM and SPARSE memory models 2. boot kernel with 3 different page-modes by setting nov4l/nov5l in cmdline 3. panic kernel Tests: - 1. create kdump-compressed file via this command - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1` - or with `--vtop` option to translate some typical addresses (like: kernel_link_addr, vmalloc_start, page_offset) 2. start crash with kdump file and do some VTOPs A test log: --- # With the Sv57 and SPARSE_EXTREME kernel # vtop the vmalloc start address -- 0xff20 # /mnt/mkdf_f --vtop 0xff20 -d31 -f --non-mmap -c /proc/vmcore /mnt/dump.file1 Translating virtual address ff20 to physical address. VIRTUAL PHYSICAL ff20 80087000 Copying data : [100.0 %] | eta: 0s The dumpfile is saved to /mnt/dump.file1. makedumpfile Completed. # sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux /tmp/hello/dump.file1 ... KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux DUMPFILE: /tmp/hello/dump.file1 [PARTIAL DUMP] CPUS: 2 DATE: Wed Sep 27 18:37:45 CST 2023 UPTIME: 00:00:18 LOAD AVERAGE: 0.00, 0.00, 0.00 TASKS: 55 NODENAME: (none) RELEASE: 6.6.0-rc1-7-g22bfc766389c VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023 MACHINE: riscv64 (unknown Mhz) MEMORY: 511.8 MB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 1 COMMAND: "sh" TASK: ff6e [THREAD_INFO: ff6e] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> vtop 0xff20 VIRTUAL PHYSICAL ff20 80087000 PGD: 814fa900 => 20010c01 P4D: 80043000 => 20025401 PUD: 80095000 => 20025801 PMD: 80096000 => 20026001 PTE: 80098000 => 20021ce7 PAGE: 80087000 PTE PHYSICAL FLAGS 20021ce7 80087000 (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY) PAGE PHYSICAL MAPPING INDEX CNT FLAGS ff1c020021c0 8008700000 1 0 // same as the makedumpfile's vtop Song Shuai (2): Add riscv64 support riscv64: Correct the pfn_start for flatmem Makefile | 2 +- arch/riscv64.c | 219 + makedumpfile.c | 18 makedumpfile.h | 107 4 files changed, 345 insertions(+), 1 deletion(-) create mode 100644 arch/riscv64.c -- 2.20.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH makedumpfile V2 1/2] Add riscv64 support
This patch adds support for riscv64 in makedumpfile. It implements the "vtop" for kenrel memory regions and supports Sv39/Sv48/Sv57 page modes for RV64. Signed-off-by: Song Shuai --- Makefile | 2 +- arch/riscv64.c | 219 + makedumpfile.c | 14 makedumpfile.h | 107 4 files changed, 341 insertions(+), 1 deletion(-) create mode 100644 arch/riscv64.c diff --git a/Makefile b/Makefile index 0608035..1d0644c 100644 --- a/Makefile +++ b/Makefile @@ -47,7 +47,7 @@ endif SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h sadump_info.h SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c cache.c tools.c printk.c detect_cycle.c OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART)) -SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c +SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c arch/loongarch64.c arch/riscv64.c OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH)) LIBS = -ldw -lbz2 -ldl -lelf -lz diff --git a/arch/riscv64.c b/arch/riscv64.c new file mode 100644 index 000..b4101e7 --- /dev/null +++ b/arch/riscv64.c @@ -0,0 +1,219 @@ +/* + * riscv64.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ +#ifdef __riscv64__ + +#include "../print_info.h" +#include "../elf_info.h" +#include "../makedumpfile.h" + +int +get_phys_base_riscv64(void) +{ + if (NUMBER(phys_ram_base) != NOT_FOUND_NUMBER) + info->phys_base = NUMBER(phys_ram_base); + else + /* In case that you are using qemu rv64 env */ + info->phys_base = 0x8020; + + DEBUG_MSG("phys_base: %lx\n", info->phys_base); + return TRUE; +} + +int +get_machdep_info_riscv64(void) +{ + + if(NUMBER(va_bits) == NOT_FOUND_NUMBER || NUMBER(page_offset) == NOT_FOUND_NUMBER || + NUMBER(vmalloc_start) == NOT_FOUND_NUMBER || NUMBER(vmalloc_end) == NOT_FOUND_NUMBER || + NUMBER(vmemmap_start) == NOT_FOUND_NUMBER || NUMBER(vmemmap_end) == NOT_FOUND_NUMBER || + NUMBER(modules_vaddr) == NOT_FOUND_NUMBER || NUMBER(modules_end) == NOT_FOUND_NUMBER || + NUMBER(kernel_link_addr) == NOT_FOUND_NUMBER || NUMBER(va_kernel_pa_offset) == NOT_FOUND_NUMBER) + return FALSE; + + if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER) + info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS); + else + info->max_physmem_bits = _MAX_PHYSMEM_BITS; + + if (NUMBER(SECTION_SIZE_BITS) != NOT_FOUND_NUMBER) + info->section_size_bits = NUMBER(SECTION_SIZE_BITS); + else + info->section_size_bits = _SECTION_SIZE_BITS; + + info->page_offset = NUMBER(page_offset); + + DEBUG_MSG("va_bits: %ld\n", NUMBER(va_bits)); + DEBUG_MSG("page_offset: %lx\n", NUMBER(page_offset)); + DEBUG_MSG("vmalloc_start: %lx\n", NUMBER(vmalloc_start)); + DEBUG_MSG("vmalloc_end: %lx\n", NUMBER(vmalloc_end)); + DEBUG_MSG("vmemmap_start: %lx\n", NUMBER(vmemmap_start)); + DEBUG_MSG("vmemmap_end: %lx\n", NUMBER(vmemmap_end)); + DEBUG_MSG("modules_vaddr: %lx\n", NUMBER(modules_vaddr)); + DEBUG_MSG("modules_end: %lx\n", NUMBER(modules_end)); + DEBUG_MSG("kernel_link_addr: %lx\n", NUMBER(kernel_link_addr)); + DEBUG_MSG("va_kernel_pa_offset: %lx\n", NUMBER(va_kernel_pa_offset)); + + return TRUE; +} + +/* + * For direct memory mapping + */ + +#define VTOP(X) ({ \ + ulong _X = X; \ + (_X) >= NUMBER(kernel_link_addr) ? ((_X) - (NUMBER(va_kernel_pa_offset))): \ + ((_X) - PAGE_OFFSET + (info->phys_base)); \ + }) + +static unsigned long long +vtop_riscv64(pgd_t * pgd, unsigned long vaddr, long va_bits) +{ + unsigned long long paddr = NOT_PADDR; + pgd_t *pgda; + p4d_t *p4da; + pud_t *puda; + pmd_t *pmda; + pte_t *ptea; + ulong pt_val, pt_phys; + +#define pgd_index(X) ((va_bits == VA_BITS_SV57) ? pgd_index_l5(X) :\ + ((va_bits == VA_BITS_SV48) ? pgd_index_l4(X) : pgd_index_l3(X))) + + /* PGD */ + pgda =
Re: [PATCH 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled
On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote: > kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is > present in the VM. It leads to write to a MSR that doesn't exist on some > configurations, namely in TDX guest: > > unchecked MSR access error: WRMSR to 0x12 (tried to write > 0x) > at rIP: 0x8110687c (kvmclock_disable+0x1c/0x30) > > kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt > features. > > Do not disable kvmclock if it was not enumerated or disabled by user > from kernel command line. For the above warning, check for CLOCKSOURCE and CLOCKSOURCE2 feature is sufficient, right? Do we need to include user/command-line disable check here? > > Signed-off-by: Kirill A. Shutemov > Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown") > --- > arch/x86/kernel/kvmclock.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c > index fb8f52149be9..cba2e732e53f 100644 > --- a/arch/x86/kernel/kvmclock.c > +++ b/arch/x86/kernel/kvmclock.c > @@ -22,7 +22,7 @@ > #include > #include > > -static int kvmclock __initdata = 1; > +static int kvmclock __ro_after_init = 1; > static int kvmclock_vsyscall __initdata = 1; > static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME; > static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK; > @@ -195,7 +195,12 @@ static void kvm_setup_secondary_clock(void) > > void kvmclock_disable(void) > { > - native_write_msr(msr_kvm_system_time, 0, 0); > + if (!kvm_para_available() || !kvmclock) > + return; > + > + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE) || > + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) > + native_write_msr(msr_kvm_system_time, 0, 0); > } > > static void __init kvmclock_init_mem(void) -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 03/13] cpu/hotplug, x86/acpi: Disable CPU hotplug for ACPI MADT wakeup
On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote: > ACPI MADT doesn't allow to offline CPU after it got woke up. > I think you can use the term "CPU hotplug" instead of just offline. > Currently hotplug prevented based on the confidential computing > attribute which is set for Intel TDX. But TDX is not the only possible > user of the wake up method. > > Mark CPU hotplug as "not supported" on ACPI MADT wakeup enumeration. Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan > > Signed-off-by: Kirill A. Shutemov > --- > arch/x86/coco/core.c | 1 - > arch/x86/kernel/acpi/madt_wakeup.c | 4 > include/linux/cc_platform.h| 10 -- > kernel/cpu.c | 2 +- > 4 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > index eeec9986570e..f07c3bb7deab 100644 > --- a/arch/x86/coco/core.c > +++ b/arch/x86/coco/core.c > @@ -20,7 +20,6 @@ static bool noinstr intel_cc_platform_has(enum cc_attr attr) > { > switch (attr) { > case CC_ATTR_GUEST_UNROLL_STRING_IO: > - case CC_ATTR_HOTPLUG_DISABLED: > case CC_ATTR_GUEST_MEM_ENCRYPT: > case CC_ATTR_MEM_ENCRYPT: > return true; > diff --git a/arch/x86/kernel/acpi/madt_wakeup.c > b/arch/x86/kernel/acpi/madt_wakeup.c > index 1b9747bfd5b9..15bdf10b1393 100644 > --- a/arch/x86/kernel/acpi/madt_wakeup.c > +++ b/arch/x86/kernel/acpi/madt_wakeup.c > @@ -1,4 +1,5 @@ > #include > +#include > #include > > /* Physical address of the Multiprocessor Wakeup Structure mailbox */ > @@ -74,6 +75,9 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers > *header, > > acpi_mp_wake_mailbox_paddr = mp_wake->base_address; > > + /* Disable CPU onlining/offlining */ > + cpu_hotplug_not_supported(); > + > apic_update_callback(wakeup_secondary_cpu_64, acpi_wakeup_cpu); > > return 0; > diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h > index cb0d6cd1c12f..d08dd65b5c43 100644 > --- a/include/linux/cc_platform.h > +++ b/include/linux/cc_platform.h > @@ -80,16 +80,6 @@ enum cc_attr { >* using AMD SEV-SNP features. >*/ > CC_ATTR_GUEST_SEV_SNP, > - > - /** > - * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled. > - * > - * The platform/OS is running as a guest/virtual machine does not > - * support CPU hotplug feature. > - * > - * Examples include TDX Guest. > - */ > - CC_ATTR_HOTPLUG_DISABLED, > }; > > #ifdef CONFIG_ARCH_HAS_CC_PLATFORM > diff --git a/kernel/cpu.c b/kernel/cpu.c > index cf536fe1a88a..9d4279476b40 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1522,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum > cpuhp_state target) >* If the platform does not support hotplug, report it explicitly to >* differentiate it from a transient offlining failure. >*/ > - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported) > + if (!cpu_hotplug_supported) > return -EOPNOTSUPP; > if (cpu_hotplug_disabled) > return -EBUSY; -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 02/13] kernel/cpu: Add support for declaring CPU hotplug not supported
On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote: > The function cpu_hotplug_not_supported() can be called to indicate that > CPU hotplug should be disabled. It does not prevent the initial bring up > of the CPU, but it stops subsequent offlining. > > This function is intended to replace CC_ATTR_HOTPLUG_DISABLED. > Looks good to me. Reviewed-by: Kuppuswamy Sathyanarayanan > Signed-off-by: Kirill A. Shutemov > --- > include/linux/cpu.h | 2 ++ > kernel/cpu.c| 17 - > 2 files changed, 18 insertions(+), 1 deletion(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index f19f56501809..aab3887cadbc 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -132,6 +132,7 @@ extern void cpus_read_lock(void); > extern void cpus_read_unlock(void); > extern int cpus_read_trylock(void); > extern void lockdep_assert_cpus_held(void); > +extern void cpu_hotplug_not_supported(void); > extern void cpu_hotplug_disable(void); > extern void cpu_hotplug_enable(void); > void clear_tasks_mm_cpumask(int cpu); > @@ -147,6 +148,7 @@ static inline void cpus_read_lock(void) { } > static inline void cpus_read_unlock(void) { } > static inline int cpus_read_trylock(void) { return true; } > static inline void lockdep_assert_cpus_held(void) { } > +static inline void cpu_hotplug_not_supported(void) { } > static inline void cpu_hotplug_disable(void) { } > static inline void cpu_hotplug_enable(void) { } > static inline int remove_cpu(unsigned int cpu) { return -EPERM; } > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 6de7c6bb74ee..cf536fe1a88a 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -484,6 +484,9 @@ static int cpu_hotplug_disabled; > > DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock); > > +/* Cleared if platform declares CPU hotplug not supported */ > +static bool cpu_hotplug_supported = true; > + > void cpus_read_lock(void) > { > percpu_down_read(_hotplug_lock); > @@ -543,6 +546,18 @@ static void lockdep_release_cpus_lock(void) > rwsem_release(_hotplug_lock.dep_map, _THIS_IP_); > } > > +/* > + * Declare CPU hotplug not supported. > + * > + * It doesn't prevent initial bring up of the CPU, but stops offlining. > + */ > +void cpu_hotplug_not_supported(void) > +{ > + cpu_maps_update_begin(); > + cpu_hotplug_supported = false; > + cpu_maps_update_done(); > +} Since this function is not used in this patch, do you need to add __maybe_unused to avoid warnings? > + > /* > * Wait for currently running CPU hotplug operations to complete (if any) and > * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' > protects > @@ -1507,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum > cpuhp_state target) >* If the platform does not support hotplug, report it explicitly to >* differentiate it from a transient offlining failure. >*/ > - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED)) > + if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported) > return -EOPNOTSUPP; > if (cpu_hotplug_disabled) > return -EBUSY; -- Sathyanarayanan Kuppuswamy Linux Kernel Developer ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.
On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote: > Hi, maintainers, > > > I get the kexec-tools 2.0.27 from > http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, > > But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file > was missing from > > this archive. > > > This causes build errors in many distributions, like debian. The error > message is as follows, > > make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] Error > 1 > kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such > file or directory >27 | #include "iomem.h" > > See also: https://buildd.debian.org/status/package.php?p=kexec-tools=sid > > > Can this archive be repaired and updated? > > > Thanks, Ming Hi, I need to think about how to deal with this from a release PoV. But can you check if the patch below resolves your problem? diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile index cee7e569a2a2..f91d0baf049a 100644 --- a/kexec/arch/loongarch/Makefile +++ b/kexec/arch/loongarch/Makefile @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS = dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS) \ kexec/arch/loongarch/kexec-loongarch.h \ kexec/arch/loongarch/image-header.h \ + kexec/arch/loongarch/iomem.h \ kexec/arch/loongarch/crashdump-loongarch.h \ kexec/arch/loongarch/include/arch/options.h ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv8 2/5] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
On 09/10/23 5:00 pm, Pingfan Liu wrote: *** Idea *** For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core. *** Implement *** At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number. *** Result *** After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core). Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort. Signed-off-by: Pingfan Liu Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Mahesh Salgaonkar Cc: Wen Xiong Cc: Baoquan He Cc: Ming Lei Cc: kexec@lists.infradead.org To: linuxppc-...@lists.ozlabs.org --- arch/powerpc/kernel/prom.c | 25 + arch/powerpc/kernel/setup-common.c | 87 +++--- 2 files changed, 85 insertions(+), 27 deletions(-) diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644 --- a/arch/powerpc/kernel/prom.c +++ b/arch/powerpc/kernel/prom.c @@ -76,7 +76,9 @@ u64 ppc64_rma_size; unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size; +#ifdef CONFIG_SMP static int __initdata boot_cpu_count; +#endif static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node, const __be32 *intserv; int i, nthreads; int len; - int found = -1; - int found_thread = 0; + bool found = false; /* We are scanning "cpu" nodes only */ if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node, for (i = 0; i < nthreads; i++) { if (be32_to_cpu(intserv[i]) == fdt_boot_cpuid_phys(initial_boot_params)) { - found = boot_cpu_count; - found_thread = i; + /* +* always map the boot-cpu logical id into the +* range of [0, thread_per_core) +*/ + boot_cpuid = i; + found = true; + /* This works around the hole in paca_ptrs[]. */ + if (nr_cpu_ids < nthreads) + set_nr_cpu_ids(nthreads); } #ifdef CONFIG_SMP /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node, } /* Not the boot CPU */ - if (found < 0) + if (!found) return 0; - DBG("boot cpu: logical %d physical %d\n", found, - be32_to_cpu(intserv[found_thread])); - boot_cpuid = found; + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, + be32_to_cpu(intserv[boot_cpuid])); - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]); /* * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 1b19a9815672..81291e13dec0 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -36,6 +36,7 @@ #include #include #include +#include #include #include #include @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc) u32 *cpu_to_phys_id = NULL; +struct interrupt_server_node { + struct list_head node; + boolavail; + int len; + __be32 *intserv; +}; + /** * setup_cpu_maps - initialize the following cpu maps: * cpu_possible_mask @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) { struct device_node *dn; - int cpu = 0; - int nthreads = 1; + int shift = 0, cpu = 0; + int j, nthreads = 1; + int len; + struct interrupt_server_node *intserv_node, *n; + struct list_head *bt_node, head; + bool avail, found_boot_cpu = false; DBG("smp_setup_cpu_maps()\n"); + INIT_LIST_HEAD(); cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32), __alignof__(u32)); if
Re: [PATCH 03/13] cpu/hotplug, x86/acpi: Disable CPU hotplug for ACPI MADT wakeup
> /* Physical address of the Multiprocessor Wakeup Structure mailbox */ > @@ -74,6 +75,9 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers > *header, > > acpi_mp_wake_mailbox_paddr = mp_wake->base_address; > > + /* Disable CPU onlining/offlining */ > + cpu_hotplug_not_supported(); > + Both onlining/offlining are prevented, or just offlining? The previous patch says: It does not prevent the initial bring up of the CPU, but it stops subsequent offlining. And ... [...] > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -1522,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum > cpuhp_state target) >* If the platform does not support hotplug, report it explicitly to >* differentiate it from a transient offlining failure. >*/ > - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported) > + if (!cpu_hotplug_supported) > return -EOPNOTSUPP; > if (cpu_hotplug_disabled) > return -EBUSY; ... here cpu_down_maps_locked() only prevents offlining if I am reading correctly. Also, can we rename cpu_hotplug_supported to cpu_offline_supported to match the behaviour better? Anyway, isn't it a little bit odd to have: if (!cpu_hotplug_supported) return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; ? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 09/13] x86/tdx: Account shared memory
> +#ifdef CONFIG_DEBUG_FS > +static int tdx_shared_memory_show(struct seq_file *m, void *p) > +{ > + unsigned long addr, end; > + unsigned long found = 0; > + > + addr = PAGE_OFFSET; > + end = PAGE_OFFSET + get_max_mapped(); > + > + while (addr < end) { > + unsigned long size; > + unsigned int level; > + pte_t *pte; > + > + pte = lookup_address(addr, ); > + size = page_level_size(level); > + > + if (pte && pte_decrypted(*pte)) > + found += size / PAGE_SIZE; > + > + addr += size; This could be a long loop, perhaps add cond_resched() here? > + } > + > + seq_printf(m, "Number of unshared pages in kernel page tables: > %16lu\n", > +found); > + seq_printf(m, "Number of pages accounted as unshared: > %16ld\n", > +atomic_long_read(_shared)); > + return 0; > +} > + ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32
Hello Pingfan, With this patch series applied, the kdump kernel fails to boot on powerpc with nr_cpus=1. Console logs: --- [root]# echo c > /proc/sysrq-trigger [ 74.783235] sysrq: Trigger a crash [ 74.783244] Kernel panic - not syncing: sysrq triggered crash [ 74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5pf-nr-cpus+ #3 [ 74.783259] Hardware name: POWER10 (raw) phyp pSeries [ 74.783275] Call Trace: [ 74.783280] [c0020f4ebac0] [c0ed9f38] dump_stack_lvl+0x6c/0x9c (unreliable) [ 74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438 [ 74.783298] [c0020f4ebb90] [c0936d48] sysrq_handle_crash+0x28/0x30 [ 74.783304] [c0020f4ebbf0] [c093773c] __handle_sysrq+0x10c/0x250 [ 74.783309] [c0020f4ebc90] [c0937fa8] write_sysrq_trigger+0xc8/0x168 [ 74.783314] [c0020f4ebcd0] [c0665d8c] proc_reg_write+0x10c/0x1b0 [ 74.783321] [c0020f4ebd00] [c058da54] vfs_write+0x104/0x4b0 [ 74.783326] [c0020f4ebdc0] [c058dfdc] ksys_write+0x7c/0x140 [ 74.783331] [c0020f4ebe10] [c0033a64] system_call_exception+0x144/0x3a0 [ 74.783337] [c0020f4ebe50] [c000c554] system_call_common+0xf4/0x258 [ 74.783343] --- interrupt: c00 at 0x7fffa0721594 [ 74.783352] NIP: 7fffa0721594 LR: 7fffa0697bf4 CTR: [ 74.783364] REGS: c0020f4ebe80 TRAP: 0c00 Not tainted (6.6.0-rc5pf-nr-cpus+) [ 74.783376] MSR: 8280f033 CR: 2802 XER: [ 74.783394] IRQMASK: 0 [ 74.783394] GPR00: 0004 7c4b6800 7fffa0807300 0001 [ 74.783394] GPR04: 00013549ea60 0002 0010 [ 74.783394] GPR08: [ 74.783394] GPR12: 7fffa0abaf70 4000 00011a0f9798 [ 74.783394] GPR16: 00011a0f9724 00011a097688 00011a02ff70 00011a0fd568 [ 74.783394] GPR20: 000135554bf0 0001 00011a0aa478 7c4b6a24 [ 74.783394] GPR24: 7c4b6a20 00011a0faf94 0002 00013549ea60 [ 74.783394] GPR28: 0002 7fffa08017a0 00013549ea60 0002 [ 74.783440] NIP [7fffa0721594] 0x7fffa0721594 [ 74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4 [ 74.783447] --- interrupt: c00 I'm in purgatory [ 0.00] radix-mmu: Page sizes from device-tree: [ 0.00] radix-mmu: Page size shift = 12 AP=0x0 [ 0.00] radix-mmu: Page size shift = 16 AP=0x5 [ 0.00] radix-mmu: Page size shift = 21 AP=0x1 [ 0.00] radix-mmu: Page size shift = 30 AP=0x2 [ 0.00] Activating Kernel Userspace Access Prevention [ 0.00] Activating Kernel Userspace Execution Prevention [ 0.00] radix-mmu: Mapped 0x-0x0001 with 64.0 KiB pages (exec) [ 0.00] radix-mmu: Mapped 0x0001-0x0020 with 64.0 KiB pages [ 0.00] radix-mmu: Mapped 0x0020-0x2000 with 2.00 MiB pages [ 0.00] radix-mmu: Mapped 0x2000-0x2260 with 2.00 MiB pages (exec) [ 0.00] radix-mmu: Mapped 0x2260-0x4000 with 2.00 MiB pages [ 0.00] radix-mmu: Mapped 0x4000-0x00018000 with 1.00 GiB pages [ 0.00] radix-mmu: Mapped 0x00018000-0x0001a000 with 2.00 MiB pages [ 0.00] lpar: Using radix MMU under hypervisor [ 0.00] Linux version 6.6.0-rc5pf-nr-cpus+ (r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct 9 11:07: 41 CDT 2023 [ 0.00] Found initrd at 0xc00022e6:0xc000248f08d8 [ 0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries [ 0.00] printk: bootconsole [udbg0] enabled [ 0.00] the round shift between dt seq and the cpu logic number: 56 [ 0.00] BUG: Unable to handle kernel data access on write at 0xc001a000 [ 0.00] Faulting instruction address: 0xc00022009c64 [ 0.00] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 0.00] Modules linked in: [ 0.00] CPU: 2 PID: 0 Comm: swapper Not tainted 6.6.0-rc5pf-nr-cpus+ #3 [ 0.00] Hardware name: POWER10 (raw) hv:phyp pSeries [ 0.00] NIP: c00022009c64 LR: c00022009c54 CTR: c000201ff348 [ 0.00] REGS: c00022aebb00 TRAP: 0300 Not tainted (6.6.0-rc5pf-nr-cpus+) [ 0.00] MSR: 80001033 CR: 28222824 XER: 0001 [ 0.00] CFAR: c00020031574 DAR: c001a000 DSISR: 4200 IRQMASK: 1 [ 0.00] GPR00: c00022009ba0 c00022aebda0 c000213d1300
Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
On 09/10/23 5:00 pm, Pingfan Liu wrote: If the boot_cpuid is smaller than nr_cpus, it requires extra effort to ensure the boot_cpu is in cpu_present_mask. This can be achieved by reserving the last quota for the boot cpu. Note: the restriction on nr_cpus will be lifted with more effort in the successive patches Signed-off-by: Pingfan Liu Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Christophe Leroy Cc: Mahesh Salgaonkar Cc: Wen Xiong Cc: Baoquan He Cc: Ming Lei Cc: kexec@lists.infradead.org To: linuxppc-...@lists.ozlabs.org --- arch/powerpc/kernel/setup-common.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 81291e13dec0..f9ef0a2666b0 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -454,8 +454,8 @@ struct interrupt_server_node { void __init smp_setup_cpu_maps(void) { struct device_node *dn; - int shift = 0, cpu = 0; - int j, nthreads = 1; + int terminate, shift = 0, cpu = 0; + int j, bt_thread = 0, nthreads = 1; int len; struct interrupt_server_node *intserv_node, *n; struct list_head *bt_node, head; @@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void) for (j = 0 ; j < nthreads; j++) { if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { bt_node = _node->node; + bt_thread = j; found_boot_cpu = true; /* * Record the round-shift between dt @@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void) /* Select the primary thread, the boot cpu's slibing, as the logic 0 */ list_add_tail(, bt_node); pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift); + terminate = nr_cpu_ids; list_for_each_entry(intserv_node, , node) { + j = 0; + /* Choose a start point to cover the boot cpu */ + if (nr_cpu_ids - 1 < bt_thread) { + /* +* The processor core puts assumption on the thread id, +* not to breach the assumption. +*/ + terminate = nr_cpu_ids - 1; nthreads is anyway assumed to be same for all cores. So, enforcing nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should make the code much simpler without the need for above check and the other complexities addressed in the subsequent patches... Thanks Hari ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32
Hello Pingfan, With this patch series applied, the kdump kernel fails to boot on powerpc with nr_cpus=1. Console logs: --- [root]# echo c > /proc/sysrq-trigger [ 74.783235] sysrq: Trigger a crash [ 74.783244] Kernel panic - not syncing: sysrq triggered crash [ 74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 6.6.0-rc5pf-nr-cpus+ #3 [ 74.783259] Hardware name: POWER10 (raw) phyp pSeries [ 74.783275] Call Trace: [ 74.783280] [c0020f4ebac0] [c0ed9f38] dump_stack_lvl+0x6c/0x9c (unreliable) [ 74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438 [ 74.783298] [c0020f4ebb90] [c0936d48] sysrq_handle_crash+0x28/0x30 [ 74.783304] [c0020f4ebbf0] [c093773c] __handle_sysrq+0x10c/0x250 [ 74.783309] [c0020f4ebc90] [c0937fa8] write_sysrq_trigger+0xc8/0x168 [ 74.783314] [c0020f4ebcd0] [c0665d8c] proc_reg_write+0x10c/0x1b0 [ 74.783321] [c0020f4ebd00] [c058da54] vfs_write+0x104/0x4b0 [ 74.783326] [c0020f4ebdc0] [c058dfdc] ksys_write+0x7c/0x140 [ 74.783331] [c0020f4ebe10] [c0033a64] system_call_exception+0x144/0x3a0 [ 74.783337] [c0020f4ebe50] [c000c554] system_call_common+0xf4/0x258 [ 74.783343] --- interrupt: c00 at 0x7fffa0721594 [ 74.783352] NIP: 7fffa0721594 LR: 7fffa0697bf4 CTR: [ 74.783364] REGS: c0020f4ebe80 TRAP: 0c00 Not tainted (6.6.0-rc5pf-nr-cpus+) [ 74.783376] MSR: 8280f033 CR: 2802 XER: [ 74.783394] IRQMASK: 0 [ 74.783394] GPR00: 0004 7c4b6800 7fffa0807300 0001 [ 74.783394] GPR04: 00013549ea60 0002 0010 [ 74.783394] GPR08: [ 74.783394] GPR12: 7fffa0abaf70 4000 00011a0f9798 [ 74.783394] GPR16: 00011a0f9724 00011a097688 00011a02ff70 00011a0fd568 [ 74.783394] GPR20: 000135554bf0 0001 00011a0aa478 7c4b6a24 [ 74.783394] GPR24: 7c4b6a20 00011a0faf94 0002 00013549ea60 [ 74.783394] GPR28: 0002 7fffa08017a0 00013549ea60 0002 [ 74.783440] NIP [7fffa0721594] 0x7fffa0721594 [ 74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4 [ 74.783447] --- interrupt: c00 I'm in purgatory [ 0.00] radix-mmu: Page sizes from device-tree: [ 0.00] radix-mmu: Page size shift = 12 AP=0x0 [ 0.00] radix-mmu: Page size shift = 16 AP=0x5 [ 0.00] radix-mmu: Page size shift = 21 AP=0x1 [ 0.00] radix-mmu: Page size shift = 30 AP=0x2 [ 0.00] Activating Kernel Userspace Access Prevention [ 0.00] Activating Kernel Userspace Execution Prevention [ 0.00] radix-mmu: Mapped 0x-0x0001 with 64.0 KiB pages (exec) [ 0.00] radix-mmu: Mapped 0x0001-0x0020 with 64.0 KiB pages [ 0.00] radix-mmu: Mapped 0x0020-0x2000 with 2.00 MiB pages [ 0.00] radix-mmu: Mapped 0x2000-0x2260 with 2.00 MiB pages (exec) [ 0.00] radix-mmu: Mapped 0x2260-0x4000 with 2.00 MiB pages [ 0.00] radix-mmu: Mapped 0x4000-0x00018000 with 1.00 GiB pages [ 0.00] radix-mmu: Mapped 0x00018000-0x0001a000 with 2.00 MiB pages [ 0.00] lpar: Using radix MMU under hypervisor [ 0.00] Linux version 6.6.0-rc5pf-nr-cpus+ (r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct 9 11:07: 41 CDT 2023 [ 0.00] Found initrd at 0xc00022e6:0xc000248f08d8 [ 0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries [ 0.00] printk: bootconsole [udbg0] enabled [ 0.00] the round shift between dt seq and the cpu logic number: 56 [ 0.00] BUG: Unable to handle kernel data access on write at 0xc001a000 [ 0.00] Faulting instruction address: 0xc00022009c64 [ 0.00] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries [ 0.00] Modules linked in: [ 0.00] CPU: 2 PID: 0 Comm: swapper Not tainted 6.6.0-rc5pf-nr-cpus+ #3 [ 0.00] Hardware name: POWER10 (raw) hv:phyp pSeries [ 0.00] NIP: c00022009c64 LR: c00022009c54 CTR: c000201ff348 [ 0.00] REGS: c00022aebb00 TRAP: 0300 Not tainted (6.6.0-rc5pf-nr-cpus+) [ 0.00] MSR: 80001033 CR: 28222824 XER: 0001 [ 0.00] CFAR: c00020031574 DAR: c001a000 DSISR: 4200 IRQMASK: 1 [ 0.00] GPR00: c00022009ba0 c00022aebda0 c000213d1300
Re: [PATCH makedumpfile 0/2] Add riscv64 support for makedumpfile
On 2023/10/07 11:27, Song Shuai wrote: > > > 在 2023/10/3 12:22, HAGIO KAZUHITO(萩尾 一仁) 写道: >> Hi, >> >> thank you for the patch. >> >> On 2023/09/27 20:18, Song Shuai wrote: >>> These 2 patches add riscv64 support for makedumpfile: >>> >>> Patch1 - Add riscv64 support >>> === >>> >>> This patch adds support for riscv64 in makedumpfile. >>> It implements the "vtop" for kenrel memory regions >>> and supports Sv39/Sv48/Sv57 page modes for RV64. >> >> Could I have a log of makedumpfile with --message-level 31 option for >> reference? e.g. >> makedumpfile -c -d 31 --message-level 31 vmcore dumpfile > mkdf.log >> >> (IIRC the kexec mail list doesn't accept attached files, so please send >> it off-list.) > > Sorry for the later reply, > > here are the log for the Sv57 and SPARSE_EXTREME kernel: > > https://termbin.com/zcf9: > > and the log for FLATMEM > > https://termbin.com/t89k Thank you for the information. > >> >>> >>> >>> Patch2 - riscv64: Correct the pfn_start for flatmem >>> == >>> >>> This patch temporarily fixes a issue of the tests about FLATMEM, >>> as the commit-msg says: >>> To let info->max_mapnr indicte the direct max PFN and then >> >> This means "indicate", right? >> > Right, would fix it if you're ok with the Patch2. The patches look good, so applied with fixing it and several indent adjustments. Thanks, Kazu > >> Thanks, >> Kazu >> >>> make the kdump header's max_mapnr_64 correct, riscv64 port >>> didn't define ARCH_PFN_OFFSET. >>> As for FLATMEM type, the pfn region of mem_map_data should >>> be adjusted to start from info->phys_base instead of zero. >>> >>> Not taking other arches into consideration and test, so I simplely >>> judge the __riscv64__ instead of ARCH_PFN_OFFSET. Maybe we can >>> improve it. >>> >>> >>> Tests >>> = >>> >>> With these 2 patches, the following tests had passed in RV64 Qemu >>> virt machine: >>> >>> Preparation: >>> --- >>> >>> 1. build kernel with FLATMEM and SPARSE memory models >>> 2. boot kernel with 3 different page-modes by setting nov4l/nov5l in >>> cmdline >>> 3. panic kernel >>> >>> Tests: >>> - >>> >>> 1. create kdump-compressed file via this command >>> - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1` >>> - or with `--vtop` option to translate some typical addresses >>> (like: >>> kernel_link_addr, vmalloc_start, page_offset) >>> >>> 2. start crash with kdump file and do some VTOPs >>> >>> >>> A test log: >>> --- >>> >>> # With the Sv57 and SPARSE_EXTREME kernel >>> # vtop the vmalloc start address -- 0xff20 >>> >>> >>> # /mnt/mkdf_f --vtop 0xff20 -d31 -f --non-mmap -c >>> /proc/vmcore /mnt/dump.file1 >>> >>> Translating virtual address ff20 to physical address. >>> VIRTUAL PHYSICAL >>> ff20 80087000 >>> >>> Copying data : [100.0 %] | >>> eta: 0s >>> >>> The dumpfile is saved to /mnt/dump.file1. >>> >>> makedumpfile Completed. >>> >>> # sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux >>> /tmp/hello/dump.file1 >>> ... >>> KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux >>> DUMPFILE: /tmp/hello/dump.file1 [PARTIAL DUMP] >>> CPUS: 2 >>> DATE: Wed Sep 27 18:37:45 CST 2023 >>> UPTIME: 00:00:18 >>> LOAD AVERAGE: 0.00, 0.00, 0.00 >>> TASKS: 55 >>> NODENAME: (none) >>> RELEASE: 6.6.0-rc1-7-g22bfc766389c >>> VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023 >>> MACHINE: riscv64 (unknown Mhz) >>> MEMORY: 511.8 MB >>> PANIC: "Kernel panic - not syncing: sysrq triggered crash" >>> PID: 1 >>> COMMAND: "sh" >>> TASK: ff6e [THREAD_INFO: ff6e] >>> CPU: 1 >>> STATE: TASK_RUNNING (PANIC) >>> >>> crash> vtop 0xff20 >>> VIRTUAL PHYSICAL >>> ff20 80087000 >>> >>> PGD: 814fa900 => 20010c01 >>> P4D: 80043000 => 20025401 >>> PUD: 80095000 => 20025801 >>> PMD: 80096000 => 20026001 >>> PTE: 80098000 => 20021ce7 >>> PAGE: 80087000 >>> >>> PTE PHYSICAL FLAGS >>> 20021ce7 80087000 (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY) >>> >>> PAGE PHYSICAL MAPPING INDEX CNT FLAGS >>> ff1c020021c0 80087000 0 0 1 0 // same as >>> the makedumpfile's vtop >>> >>> >>> Song Shuai (2): >>> Add riscv64 support >>> riscv64: Correct the pfn_start for flatmem >>> >>> Makefile | 2 +- >>> arch/riscv64.c | 219 >>> + >>> makedumpfile.c | 18 >>> makedumpfile.h | 107 >>> 4 files changed, 345 insertions(+), 1 deletion(-) >>> create mode 100644 arch/riscv64.c >