Re: [PATCH 0/7] Kexec-tools: Improve RISC-V port

2023-10-10 Thread Song Shuai



在 2023/9/20 19:56, Simon Horman 写道:

On Fri, Sep 15, 2023 at 11:50:06AM +0800, Song Shuai wrote:

Hi,

This series is created to improve RISC-V port of kexec-tools,
and is based on the horms/kexec-tools:build-test-riscv-v2 branch.


In my mind the big question is how to move RISC-V support
from that branch, to being merged into main.

IIRC there were some issues that needed to be addressed.
Perhaps they are all addressed by this series, and with
some appropriate squashing we can move forwards with a series
based on main?


Hi, Simon and Nick:

I squashed the first four patches as a "RISC-V: Some fixes for riscv 
port" patch and then took the horms/main as the base to collect the 2 
patches from horms/build-test-riscv-v2 branch and this series togother. 
These are the Github link and all commits for RISC-V.


https://github.com/sugarfillet/kexec-tools/commits/main_rv

5dc133e RISC-V: Support loading Image binary file
b042f6d RISC-V: Separate elf_riscv_find_pbase out
8f344c7 RISC-V: Enable kexec_file_load syscall
7d4b982 RISC-V: Some fixes for riscv port
3205c1c local: RISC-V: distribute purgatory/riscv/Makefile
54f9daf RISC-V: Add support for riscv kexec/kdump on kexec-tools

Since I didn't found the issues/fixes as Nick mentioned with these 
commits, I prefer to merge them into horms/main and let more kexec/kdump 
users to help improve/fixup RISC-V port.


I would like to listen to your advice.





For your convenience, here is my Github branch for kexec-tools:
https://github.com/sugarfillet/kexec-tools/commits/rv-Image

The first four patches fixes some build or runtime issues:

   RISC-V: Use linux,usable-memory-range for crash kernel
   RISC-V: Fix the undeclared ‘EM_RISCV’ build failure
   RISC-V: Get memory ranges from iomem
   RISC-V: Correct the usage of command line option

The last three patches enable the kexec_file_load syscall to load
vmlinux and support loading Image binary file for two syscalls.

   RISC-V: Enable kexe_file_load
   RISC-V: Separate elf_riscv_find_pbase out
   RISC-V: Support loading Image binary file

Note that:

RISC-V Linux kexec_load_file's support for Image file has been sent out but not 
merged [1].

[1]: 
https://lore.kernel.org/linux-riscv/20230914020044.1397356-1-songshuaish...@tinylab.org/T/#t

Li Zhengyu (1):
   RISC-V: Enable kexe_file_load

Song Shuai (6):
   RISC-V: Use linux,usable-memory-range for crash kernel
   RISC-V: Fix the undeclared ‘EM_RISCV’ build failure
   RISC-V: Get memory ranges from iomem
   RISC-V: Correct the usage of command line option
   RISC-V: Separate elf_riscv_find_pbase out
   RISC-V: Support loading Image binary file

  kexec/arch/riscv/Makefile|   2 +
  kexec/arch/riscv/crashdump-riscv.c   |   2 +-
  kexec/arch/riscv/image-header.h  |  88 ++
  kexec/arch/riscv/iomem.h |  10 ++
  kexec/arch/riscv/kexec-elf-riscv.c   |  77 +---
  kexec/arch/riscv/kexec-image-riscv.c |  95 +++
  kexec/arch/riscv/kexec-riscv.c   | 176 ++-
  kexec/arch/riscv/kexec-riscv.h   |  21 
  kexec/kexec-syscall.h|   3 +
  9 files changed, 368 insertions(+), 106 deletions(-)
  create mode 100644 kexec/arch/riscv/image-header.h
  create mode 100644 kexec/arch/riscv/iomem.h
  create mode 100644 kexec/arch/riscv/kexec-image-riscv.c

--
2.20.1





--
Thanks
Song Shuai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec/loongarch64: fix 'make dist' file loss issue

2023-10-10 Thread Ming Wang
The Makefile omits the iomem.h file, causing the archive file
generated by 'make dist' to lose iomem.h. This patch is used to
fix this problem.

Signed-off-by: Ming Wang 
---
 kexec/arch/loongarch/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile
index cee7e56..f91d0ba 100644
--- a/kexec/arch/loongarch/Makefile
+++ b/kexec/arch/loongarch/Makefile
@@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS =
 dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS)  
\
kexec/arch/loongarch/kexec-loongarch.h  
\
kexec/arch/loongarch/image-header.h 
\
+   kexec/arch/loongarch/iomem.h
\
kexec/arch/loongarch/crashdump-loongarch.h  
\
kexec/arch/loongarch/include/arch/options.h

base-commit: 6419b008fde783fd0cc2cc266bd1c9cf35e99a0e
-- 
2.39.2


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.

2023-10-10 Thread Ming Wang
Hi, Simon

On 10/10/23 21:01, Simon Horman wrote:
> On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote:
>> Hi,  maintainers,
>>
>>
>> I get the kexec-tools 2.0.27 from 
>> http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, 
>>
>> But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file 
>> was missing from
>>
>> this archive.
>>
>>
>> This causes build errors in many distributions, like debian.  The error 
>> message is as follows,
>>
>> make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] 
>> Error 1
>> kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such 
>> file or directory
>>27 | #include "iomem.h"
>>
>> See also: 
>> https://buildd.debian.org/status/package.php?p=kexec-tools=sid
>>
>>
>> Can this archive be repaired and updated?
>>
>>
>> Thanks, Ming
> Hi,
>
> I need to think about how to deal with this from a release PoV.
> But can you check if the patch below resolves your problem?
>
> diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile
> index cee7e569a2a2..f91d0baf049a 100644
> --- a/kexec/arch/loongarch/Makefile
> +++ b/kexec/arch/loongarch/Makefile
> @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS =
>  dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS)
> \
>   kexec/arch/loongarch/kexec-loongarch.h  
> \
>   kexec/arch/loongarch/image-header.h 
> \
> + kexec/arch/loongarch/iomem.h
> \
>   kexec/arch/loongarch/crashdump-loongarch.h  
> \
>   kexec/arch/loongarch/include/arch/options.h

Add this patch and then make dist, it's OK.

Sorry, I was stupid. This can  fix the problem of missing files.




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv8 2/5] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-10 Thread Pingfan Liu
On Tue, Oct 10, 2023 at 04:07:00PM +0530, Hari Bathini wrote:
> 
> 
> On 09/10/23 5:00 pm, Pingfan Liu wrote:
> > *** Idea ***
> > For kexec -p, the boot cpu can be not the cpu0, this causes the problem
> > of allocating memory for paca_ptrs[]. However, in theory, there is no
> > requirement to assign cpu's logical id as its present sequence in the
> > device tree. But there is something like cpu_first_thread_sibling(),
> > which makes assumption on the mapping inside a core. Hence partially
> > loosening the mapping, i.e. unbind the mapping of core while keep the
> > mapping inside a core.
> > 
> > *** Implement ***
> > At this early stage, there are plenty of memory to utilize. Hence, this
> > patch allocates interim memory to link the cpu info on a list, then
> > reorder cpus by changing the list head. As a result, there is a rotate
> > shift between the sequence number in dt and the cpu logical number.
> > 
> > *** Result ***
> > After this patch, a boot-cpu's logical id will always be mapped into the
> > range [0,threads_per_core).
> > 
> > Besides this, at this phase, all threads in the boot core are forced to
> > be onlined. This restriction will be lifted in a later patch with
> > extra effort.
> > 
> > Signed-off-by: Pingfan Liu 
> > Cc: Michael Ellerman 
> > Cc: Nicholas Piggin 
> > Cc: Christophe Leroy 
> > Cc: Mahesh Salgaonkar 
> > Cc: Wen Xiong 
> > Cc: Baoquan He 
> > Cc: Ming Lei 
> > Cc: kexec@lists.infradead.org
> > To: linuxppc-...@lists.ozlabs.org
> > ---
> >   arch/powerpc/kernel/prom.c | 25 +
> >   arch/powerpc/kernel/setup-common.c | 87 +++---
> >   2 files changed, 85 insertions(+), 27 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index ec82f5bda908..87272a2d8c10 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -76,7 +76,9 @@ u64 ppc64_rma_size;
> >   unsigned int boot_cpu_node_count __ro_after_init;
> >   #endif
> >   static phys_addr_t first_memblock_size;
> > +#ifdef CONFIG_SMP
> >   static int __initdata boot_cpu_count;
> > +#endif
> >   static int __init early_parse_mem(char *p)
> >   {
> > @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
> > node,
> > const __be32 *intserv;
> > int i, nthreads;
> > int len;
> > -   int found = -1;
> > -   int found_thread = 0;
> > +   bool found = false;
> > /* We are scanning "cpu" nodes only */
> > if (type == NULL || strcmp(type, "cpu") != 0)
> > @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned 
> > long node,
> > for (i = 0; i < nthreads; i++) {
> > if (be32_to_cpu(intserv[i]) ==
> > fdt_boot_cpuid_phys(initial_boot_params)) {
> > -   found = boot_cpu_count;
> > -   found_thread = i;
> > +   /*
> > +* always map the boot-cpu logical id into the
> > +* range of [0, thread_per_core)
> > +*/
> > +   boot_cpuid = i;
> > +   found = true;
> > +   /* This works around the hole in paca_ptrs[]. */
> > +   if (nr_cpu_ids < nthreads)
> > +   set_nr_cpu_ids(nthreads);
> > }
> >   #ifdef CONFIG_SMP
> > /* logical cpu id is always 0 on UP kernels */
> > @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned 
> > long node,
> > }
> > /* Not the boot CPU */
> > -   if (found < 0)
> > +   if (!found)
> > return 0;
> > -   DBG("boot cpu: logical %d physical %d\n", found,
> > -   be32_to_cpu(intserv[found_thread]));
> > -   boot_cpuid = found;
> > +   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
> > +   be32_to_cpu(intserv[boot_cpuid]));
> > -   boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> > +   boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
> > /*
> >  * PAPR defines "logical" PVR values for cpus that
> > diff --git a/arch/powerpc/kernel/setup-common.c 
> > b/arch/powerpc/kernel/setup-common.c
> > index 1b19a9815672..81291e13dec0 100644
> > --- a/arch/powerpc/kernel/setup-common.c
> > +++ b/arch/powerpc/kernel/setup-common.c
> > @@ -36,6 +36,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
> >   u32 *cpu_to_phys_id = NULL;
> > +struct interrupt_server_node {
> > +   struct list_head node;
> > +   boolavail;
> > +   int len;
> > +   __be32 *intserv;
> > +};
> > +
> >   /**
> >* setup_cpu_maps - initialize the following cpu maps:
> >*  cpu_possible_mask
> > @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
> >   void __init smp_setup_cpu_maps(void)
> >   {
> > struct device_node *dn;
> > -   int cpu = 0;
> > -   int nthreads = 1;
> > +   int shift = 0, cpu = 

Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus

2023-10-10 Thread Pingfan Liu
On Tue, Oct 10, 2023 at 01:56:13PM +0530, Hari Bathini wrote:
> 
> 
> On 09/10/23 5:00 pm, Pingfan Liu wrote:
> > If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
> > ensure the boot_cpu is in cpu_present_mask. This can be achieved by
> > reserving the last quota for the boot cpu.
> > 
> > Note: the restriction on nr_cpus will be lifted with more effort in the
> > successive patches
> > 
> > Signed-off-by: Pingfan Liu 
> > Cc: Michael Ellerman 
> > Cc: Nicholas Piggin 
> > Cc: Christophe Leroy 
> > Cc: Mahesh Salgaonkar 
> > Cc: Wen Xiong 
> > Cc: Baoquan He 
> > Cc: Ming Lei 
> > Cc: kexec@lists.infradead.org
> > To: linuxppc-...@lists.ozlabs.org
> > ---
> >   arch/powerpc/kernel/setup-common.c | 25 ++---
> >   1 file changed, 22 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup-common.c 
> > b/arch/powerpc/kernel/setup-common.c
> > index 81291e13dec0..f9ef0a2666b0 100644
> > --- a/arch/powerpc/kernel/setup-common.c
> > +++ b/arch/powerpc/kernel/setup-common.c
> > @@ -454,8 +454,8 @@ struct interrupt_server_node {
> >   void __init smp_setup_cpu_maps(void)
> >   {
> > struct device_node *dn;
> > -   int shift = 0, cpu = 0;
> > -   int j, nthreads = 1;
> > +   int terminate, shift = 0, cpu = 0;
> > +   int j, bt_thread = 0, nthreads = 1;
> > int len;
> > struct interrupt_server_node *intserv_node, *n;
> > struct list_head *bt_node, head;
> > @@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
> > for (j = 0 ; j < nthreads; j++) {
> > if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
> > bt_node = _node->node;
> > +   bt_thread = j;
> > found_boot_cpu = true;
> > /*
> >  * Record the round-shift between dt
> > @@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
> > /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> > list_add_tail(, bt_node);
> > pr_info("the round shift between dt seq and the cpu logic number: 
> > %d\n", shift);
> > +   terminate = nr_cpu_ids;
> > list_for_each_entry(intserv_node, , node) {
> > +   j = 0;
> 
> > +   /* Choose a start point to cover the boot cpu */
> > +   if (nr_cpu_ids - 1 < bt_thread) {
> > +   /*
> > +* The processor core puts assumption on the thread id,
> > +* not to breach the assumption.
> > +*/
> > +   terminate = nr_cpu_ids - 1;
> 
> nthreads is anyway assumed to be same for all cores. So, enforcing
> nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should
> make the code much simpler without the need for above check and the
> other complexities addressed in the subsequent patches...
> 

Indeed, this series can be splited into two partsk, [1-2/5] and [3-5/5].
In [1-2/5], if smaller, the nr_cpu_ids is enforced to be equal to
nthreads. I will make it align upward on nthreads in the next version.
So [1-2/5] can be totally independent from the rest patches in this
series.


>From an engineer's perspective, [3-5/5] are added to maintain the
nr_cpus semantics. (Finally, nr_cpus=1 can be achieved but requiring
effort on other subsystem)


Testing result on my Power9 machine with SMT=4

-1. taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'

kdump:/# cat /proc/meminfo | grep Percpu
Percpu:  896 kB
kdump:/# cat /sys/devices/system/cpu/possible
0


-2. taskset -c 5 bash -c 'echo c > /proc/sysrq-trigger'

kdump:/# cat /proc/meminfo | grep Percpu
Percpu: 1792 kB
kdump:/# cat /sys/devices/system/cpu/possible
0-1



-3. taskset -c 6 bash -c 'echo c > /proc/sysrq-trigger'

kdump:/# cat /proc/meminfo | grep Percpu
Percpu: 1792 kB
kdump:/# cat /sys/devices/system/cpu/possible
0,2


-4. taskset -c 7 bash -c 'echo c > /proc/sysrq-trigger'

kdump:/# cat /proc/meminfo | grep Percpu
Percpu: 1792 kB
kdump:/# cat /sys/devices/system/cpu/possible
0,3


Thanks,
Pingfan




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.

2023-10-10 Thread Ming Wang
Hi, Simon

Thank you for your reply.

On 10/10/23 21:01, Simon Horman wrote:
> On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote:
>> Hi,  maintainers,
>>
>>
>> I get the kexec-tools 2.0.27 from 
>> http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, 
>>
>> But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file 
>> was missing from
>>
>> this archive.
>>
>>
>> This causes build errors in many distributions, like debian.  The error 
>> message is as follows,
>>
>> make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] 
>> Error 1
>> kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such 
>> file or directory
>>27 | #include "iomem.h"
>>
>> See also: 
>> https://buildd.debian.org/status/package.php?p=kexec-tools=sid
>>
>>
>> Can this archive be repaired and updated?
>>
>>
>> Thanks, Ming
> Hi,
>
> I need to think about how to deal with this from a release PoV.
> But can you check if the patch below resolves your problem?

Thanks for the patch, I think this patch is necessary.


But it doesn't solve my problem. My purpose is to port the loongarch 
architecture
kexec-tools tool to Debian.

However,debian community's automatic build system pulled the source from
http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz


So, my problem may still require updating the archive.
But I can wait for version 2.0.8 then continuing the debian porting work.

Should this problem be fixed in 2.0.28?

>
> diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile
> index cee7e569a2a2..f91d0baf049a 100644
> --- a/kexec/arch/loongarch/Makefile
> +++ b/kexec/arch/loongarch/Makefile
> @@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS =
>  dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS)
> \
>   kexec/arch/loongarch/kexec-loongarch.h  
> \
>   kexec/arch/loongarch/image-header.h 
> \
> + kexec/arch/loongarch/iomem.h
> \
>   kexec/arch/loongarch/crashdump-loongarch.h  
> \
>   kexec/arch/loongarch/include/arch/options.h


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32

2023-10-10 Thread Pingfan Liu
On Tue, Oct 10, 2023 at 02:38:40PM +0530, Sourabh Jain wrote:
> Hello Pingfan,
> 
> > 
> > With this patch series applied, the kdump kernel fails to boot on
> > powerpc with nr_cpus=1.
> > 
> > Console logs:
> > ---
> > [root]# echo c > /proc/sysrq-trigger
> > [   74.783235] sysrq: Trigger a crash
> > [   74.783244] Kernel panic - not syncing: sysrq triggered crash
> > [   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted
> > 6.6.0-rc5pf-nr-cpus+ #3
> > [   74.783259] Hardware name: POWER10 (raw) phyp pSeries
> > [   74.783275] Call Trace:
> > [   74.783280] [c0020f4ebac0] [c0ed9f38]
> > dump_stack_lvl+0x6c/0x9c (unreliable)
> > [   74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438
> > [   74.783298] [c0020f4ebb90] [c0936d48]
> > sysrq_handle_crash+0x28/0x30
> > [   74.783304] [c0020f4ebbf0] [c093773c]
> > __handle_sysrq+0x10c/0x250
> > [   74.783309] [c0020f4ebc90] [c0937fa8]
> > write_sysrq_trigger+0xc8/0x168
> > [   74.783314] [c0020f4ebcd0] [c0665d8c]
> > proc_reg_write+0x10c/0x1b0
> > [   74.783321] [c0020f4ebd00] [c058da54]
> > vfs_write+0x104/0x4b0
> > [   74.783326] [c0020f4ebdc0] [c058dfdc]
> > ksys_write+0x7c/0x140
> > [   74.783331] [c0020f4ebe10] [c0033a64]
> > system_call_exception+0x144/0x3a0
> > [   74.783337] [c0020f4ebe50] [c000c554]
> > system_call_common+0xf4/0x258
> > [   74.783343] --- interrupt: c00 at 0x7fffa0721594
> > [   74.783352] NIP:  7fffa0721594 LR: 7fffa0697bf4 CTR:
> > 
> > [   74.783364] REGS: c0020f4ebe80 TRAP: 0c00   Not tainted
> > (6.6.0-rc5pf-nr-cpus+)
> > [   74.783376] MSR:  8280f033
> >   CR: 2802  XER: 
> > [   74.783394] IRQMASK: 0
> > [   74.783394] GPR00: 0004 7c4b6800 7fffa0807300
> > 0001
> > [   74.783394] GPR04: 00013549ea60 0002 0010
> > 
> > [   74.783394] GPR08:   
> > 
> > [   74.783394] GPR12:  7fffa0abaf70 4000
> > 00011a0f9798
> > [   74.783394] GPR16: 00011a0f9724 00011a097688 00011a02ff70
> > 00011a0fd568
> > [   74.783394] GPR20: 000135554bf0 0001 00011a0aa478
> > 7c4b6a24
> > [   74.783394] GPR24: 7c4b6a20 00011a0faf94 0002
> > 00013549ea60
> > [   74.783394] GPR28: 0002 7fffa08017a0 00013549ea60
> > 0002
> > [   74.783440] NIP [7fffa0721594] 0x7fffa0721594
> > [   74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4
> > [   74.783447] --- interrupt: c00
> > I'm in purgatory
> > [    0.00] radix-mmu: Page sizes from device-tree:
> > [    0.00] radix-mmu: Page size shift = 12 AP=0x0
> > [    0.00] radix-mmu: Page size shift = 16 AP=0x5
> > [    0.00] radix-mmu: Page size shift = 21 AP=0x1
> > [    0.00] radix-mmu: Page size shift = 30 AP=0x2
> > [    0.00] Activating Kernel Userspace Access Prevention
> > [    0.00] Activating Kernel Userspace Execution Prevention
> > [    0.00] radix-mmu: Mapped 0x-0x0001
> > with 64.0 KiB pages (exec)
> > [    0.00] radix-mmu: Mapped 0x0001-0x0020
> > with 64.0 KiB pages
> > [    0.00] radix-mmu: Mapped 0x0020-0x2000
> > with 2.00 MiB pages
> > [    0.00] radix-mmu: Mapped 0x2000-0x2260
> > with 2.00 MiB pages (exec)
> > [    0.00] radix-mmu: Mapped 0x2260-0x4000
> > with 2.00 MiB pages
> > [    0.00] radix-mmu: Mapped 0x4000-0x00018000
> > with 1.00 GiB pages
> > [    0.00] radix-mmu: Mapped 0x00018000-0x0001a000
> > with 2.00 MiB pages
> > [    0.00] lpar: Using radix MMU under hypervisor
> > [    0.00] Linux version 6.6.0-rc5pf-nr-cpus+
> > (r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 (Red
> > Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:
> > 41 CDT 2023
> > [    0.00] Found initrd at 0xc00022e6:0xc000248f08d8
> > [    0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200
> > 0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries
> > [    0.00] printk: bootconsole [udbg0] enabled
> > [    0.00] the round shift between dt seq and the cpu logic number:
> > 56
> > [    0.00] BUG: Unable to handle kernel data access on write at
> > 0xc001a000
> > [    0.00] Faulting instruction address: 0xc00022009c64
> > [    0.00] Oops: Kernel access of bad area, sig: 11 [#1]
> > [    0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > [    0.00] Modules linked in:
> > [    0.00] CPU: 2 PID: 0 Comm: swapper Not tainted
> > 6.6.0-rc5pf-nr-cpus+ #3
> > [    0.00] Hardware name:  POWER10 (raw)  

Re: [PATCH makedumpfile V2 0/2] Add riscv64 support for makedumpfile

2023-10-10 Thread 萩尾 一仁
On 2023/10/10 23:12, Song Shuai wrote:
> Changes since V1:
> https://lore.kernel.org/kexec/20230927111822.180630-1-songshuaish...@tinylab.org/
> 
> - fix a typo in Patch2's commit-msg
> - adjust some indentions of Patch1

Thank you, but already applied the v1 patches with fixes on my end:
https://github.com/makedumpfile/makedumpfile/compare/a34f017...aee7f3b

I should have sent this link, sorry about that.

Thanks,
Kazu

> 
> 
> These 2 patches add riscv64 support for makedumpfile:
> 
> Patch1 - Add riscv64 support
> ===
> 
> This patch adds support for riscv64 in makedumpfile.
> It implements the "vtop" for kenrel memory regions
> and supports Sv39/Sv48/Sv57 page modes for RV64.
> 
> 
> Patch2 - riscv64: Correct the pfn_start for flatmem
> ==
> 
> This patch temporarily fixes a issue of the tests about FLATMEM,
> as the commit-msg says:
>
>  To let info->max_mapnr indicate the direct max PFN and then
>  make the kdump header's max_mapnr_64 correct, riscv64 port
>  didn't define ARCH_PFN_OFFSET.
>  
>  As for FLATMEM type, the pfn region of mem_map_data should
>  be adjusted to start from info->phys_base instead of zero.
> 
> 
> Tests
> =
> 
> With these 2 patches, the following tests had passed in RV64 Qemu virt 
> machine:
> 
> Preparation:
> ---
> 
> 1. build kernel with FLATMEM and SPARSE memory models
> 2. boot kernel with 3 different page-modes by setting nov4l/nov5l in cmdline
> 3. panic kernel
> 
> Tests:
> -
> 
> 1. create kdump-compressed file via this command
> - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1`
> - or with `--vtop` option to translate some typical addresses (like:
>   kernel_link_addr, vmalloc_start, page_offset)
> 
> 2. start crash with kdump file and do some VTOPs
> 
> 
> A test log:
> ---
> 
> # With the Sv57 and SPARSE_EXTREME kernel
> # vtop the vmalloc start address -- 0xff20
> 
> 
> # /mnt/mkdf_f  --vtop 0xff20 -d31 -f --non-mmap -c /proc/vmcore 
> /mnt/dump.file1
> 
> Translating virtual address ff20 to physical address.
> VIRTUAL   PHYSICAL
> ff20  80087000
> 
> Copying data  : [100.0 %] |
> eta: 0s
> 
> The dumpfile is saved to /mnt/dump.file1.
> 
> makedumpfile Completed.
> 
> # sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux 
> /tmp/hello/dump.file1
> ...
>KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux
>  DUMPFILE: /tmp/hello/dump.file1  [PARTIAL DUMP]
>  CPUS: 2
>  DATE: Wed Sep 27 18:37:45 CST 2023
>UPTIME: 00:00:18
> LOAD AVERAGE: 0.00, 0.00, 0.00
> TASKS: 55
>  NODENAME: (none)
>   RELEASE: 6.6.0-rc1-7-g22bfc766389c
>   VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023
>   MACHINE: riscv64  (unknown Mhz)
>MEMORY: 511.8 MB
> PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>   PID: 1
>   COMMAND: "sh"
>  TASK: ff6e  [THREAD_INFO: ff6e]
>   CPU: 1
> STATE: TASK_RUNNING (PANIC)
> 
> crash> vtop 0xff20
> VIRTUAL   PHYSICAL
> ff20  80087000
> 
>PGD: 814fa900 => 20010c01
>P4D: 80043000 => 20025401
>PUD: 80095000 => 20025801
>PMD: 80096000 => 20026001
>PTE: 80098000 => 20021ce7
>   PAGE: 80087000
> 
>PTE PHYSICAL  FLAGS
> 20021ce7  80087000  (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY)
> 
>PAGE   PHYSICAL  MAPPING   INDEX CNT FLAGS
> ff1c020021c0 8008700000  1 0  // same as the 
> makedumpfile's vtop
> 
> Song Shuai (2):
>Add riscv64 support
>riscv64: Correct the pfn_start for flatmem
> 
>   Makefile   |   2 +-
>   arch/riscv64.c | 219 +
>   makedumpfile.c |  18 
>   makedumpfile.h | 107 
>   4 files changed, 345 insertions(+), 1 deletion(-)
>   create mode 100644 arch/riscv64.c
> 
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Fix reboot race during device_shutdown()

2023-10-10 Thread Eric W. Biederman
Joel Fernandes  writes:

> On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman  
> wrote:
>>
>> Joel Fernandes  writes:
>>
>> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes  
>> > wrote:
>> > [..]
>> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
>> >> > > kexec_image->preserve_context is true. However, doing it if either of 
>> >> > > these are
>> >> > > not true prevents crashes/races.
>> >> >
>> >> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
>> >> > like suspend to RAM.  Maybe reboot should as well, but I am
>> >> > uncomfortable making a generic device fix kexec specific.
>> >>
>> >> I see your point of view. I think regular reboot should also be fixed
>> >> to avoid similar crash possibilities. I am happy to make a change for
>> >> that similar to this patch if we want to proceed that way.
>> >>
>> >> Thoughts?
>> >
>> > Just checking how we want to proceed, is the consensus that we should
>> > prevent kernel crashes without relying on userspace stopping all
>> > processes? Should we fix regular reboot syscall as well and not just
>> > kexec reboot?
>>
>> It just occurred to me there is something very fishy about all of this.
>>
>> What userspace do you have using kexec (not kexec on panic) that doesn't
>> preform the same userspace shutdown as a normal reboot?
>>
>> Quite frankly such a userspace is buggy, and arguably that is where you
>> should start fixing things.
>
> It is a simple unit test that tests kexec support by kexec-rebooting
> the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is
> ideal because in a real panic-on-kexec type crash, that may not happen
> and so does not emulate the real world that well. I think we want the
> kexec-reboot to do a *reboot* without crashing the kernel while doing
> so. Ricardo/Steve can chime on what they feel as well.

This is confusing.  You have a unit test that, that tests kexec on
panic using a the full kexec reboot.

The two are fundamentally similar but you aren't going to have a valid
test case if you mix them.

There is a whole kernel module that tests more interesting cases,
for the simple case you probably just want to do:

echo 'p' > /proc/sysrq-trigger

At least I think it is p that causes a kernel-panic.

That will ensure you are exercising the kexec on panic code path.  That
performs the minimal shutdown in the kernel.

>> That way you can get the orderly shutdown
>> of userspace daemons/services along with an orderly shutdown of
>> everything the kernel is responsible for.
>
> Fixing in userspace is an option but people are not happy that the
> kernel can crash like that.

In a kexec on panic scenario the kernel needs to perform that absolute
bare essential shutdown before calling kexec (basically nothing).
During kexec-on-panic nothing can be relied upon because we don't know
what is broken.  If that is what you care about (as suggested by the
unit test) you need to fix the device initialization.

In a normal kexec scenario the whole normal reboot process is expected.
I have no problems with fixing the kernel to handle that scenario,
but in the real world the entire orderly shutdown both, kernel
and userspace should be performed.

>> At the kernel level a kexec reboot and a normal reboot have been
>> deliberately kept as close as possible.  Which is why I say we should
>> fix it in reboot.
>
> You mean fix it in userspace?

No.  I mean in the kernel the orderly shutdown for a kexec reboot and an
ordinary reboot are kept as close to the same as possible.

It should be the case that the only differences between the two is that
in once case system firmware takes over after the orderly shutdown,
and in the other case a new kernel takes over after the orderly shutdown.

Eric


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Fix reboot race during device_shutdown()

2023-10-10 Thread Joel Fernandes
On Mon, Oct 9, 2023 at 10:00 AM Steven Rostedt  wrote:
>
> On Sat, 7 Oct 2023 21:30:42 -0400
> Joel Fernandes  wrote:
>
> > Just checking how we want to proceed, is the consensus that we should
> > prevent kernel crashes without relying on userspace stopping all
> > processes? Should we fix regular reboot syscall as well and not just
> > kexec reboot?
>
> If you can show that we can trigger the crash on normal reboot, then I
> don't see why not. That is, if you have a program that does the reboot
> (without the SIGSTOP/SIGKILL calls) and triggers this crash, I think that's
> a legitimate reason to fix it on normal reboot too.

Ok, Sounds good, thanks for sharing your thoughts.

 - Joel

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec: Fix reboot race during device_shutdown()

2023-10-10 Thread Joel Fernandes
On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman  wrote:
>
> Joel Fernandes  writes:
>
> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes  
> > wrote:
> > [..]
> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
> >> > > kexec_image->preserve_context is true. However, doing it if either of 
> >> > > these are
> >> > > not true prevents crashes/races.
> >> >
> >> > The KEXEC_JUMP case is something else entirely.  It is supposed to work
> >> > like suspend to RAM.  Maybe reboot should as well, but I am
> >> > uncomfortable making a generic device fix kexec specific.
> >>
> >> I see your point of view. I think regular reboot should also be fixed
> >> to avoid similar crash possibilities. I am happy to make a change for
> >> that similar to this patch if we want to proceed that way.
> >>
> >> Thoughts?
> >
> > Just checking how we want to proceed, is the consensus that we should
> > prevent kernel crashes without relying on userspace stopping all
> > processes? Should we fix regular reboot syscall as well and not just
> > kexec reboot?
>
> It just occurred to me there is something very fishy about all of this.
>
> What userspace do you have using kexec (not kexec on panic) that doesn't
> preform the same userspace shutdown as a normal reboot?
>
> Quite frankly such a userspace is buggy, and arguably that is where you
> should start fixing things.

It is a simple unit test that tests kexec support by kexec-rebooting
the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is
ideal because in a real panic-on-kexec type crash, that may not happen
and so does not emulate the real world that well. I think we want the
kexec-reboot to do a *reboot* without crashing the kernel while doing
so. Ricardo/Steve can chime on what they feel as well.

> That way you can get the orderly shutdown
> of userspace daemons/services along with an orderly shutdown of
> everything the kernel is responsible for.

Fixing in userspace is an option but people are not happy that the
kernel can crash like that.

> At the kernel level a kexec reboot and a normal reboot have been
> deliberately kept as close as possible.  Which is why I say we should
> fix it in reboot.

You mean fix it in userspace?

thanks,

 - Joel

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH makedumpfile V2 2/2] riscv64: Correct the pfn_start for flatmem

2023-10-10 Thread Song Shuai
To let info->max_mapnr indicate the direct max PFN and then
make the kdump header's max_mapnr_64 correct, riscv64 port
didn't define ARCH_PFN_OFFSET.

As for FLATMEM type, the pfn region of mem_map_data should
be adjusted to start from info->phys_base instead of zero.

Signed-off-by: Song Shuai 
---
 makedumpfile.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/makedumpfile.c b/makedumpfile.c
index 42d5565..3705bdd 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3302,7 +3302,11 @@ get_mm_flatmem(void)
if (is_xen_memory())
dump_mem_map(0, info->dom0_mapnr, mem_map, 0);
else
+#ifdef __riscv64__
+   dump_mem_map((info->phys_base >> PAGESHIFT()), info->max_mapnr, 
mem_map, 0);
+#else
dump_mem_map(0, info->max_mapnr, mem_map, 0);
+#endif
 
return TRUE;
 }
-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH makedumpfile V2 0/2] Add riscv64 support for makedumpfile

2023-10-10 Thread Song Shuai
Changes since V1:
https://lore.kernel.org/kexec/20230927111822.180630-1-songshuaish...@tinylab.org/

- fix a typo in Patch2's commit-msg
- adjust some indentions of Patch1


These 2 patches add riscv64 support for makedumpfile:

Patch1 - Add riscv64 support 
===

This patch adds support for riscv64 in makedumpfile.
It implements the "vtop" for kenrel memory regions
and supports Sv39/Sv48/Sv57 page modes for RV64.


Patch2 - riscv64: Correct the pfn_start for flatmem
==

This patch temporarily fixes a issue of the tests about FLATMEM, 
as the commit-msg says:
  
To let info->max_mapnr indicate the direct max PFN and then
make the kdump header's max_mapnr_64 correct, riscv64 port
didn't define ARCH_PFN_OFFSET.

As for FLATMEM type, the pfn region of mem_map_data should
be adjusted to start from info->phys_base instead of zero.


Tests
=

With these 2 patches, the following tests had passed in RV64 Qemu virt machine:

Preparation: 
---

1. build kernel with FLATMEM and SPARSE memory models 
2. boot kernel with 3 different page-modes by setting nov4l/nov5l in cmdline 
3. panic kernel 

Tests:
-

1. create kdump-compressed file via this command 
   - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1`
   - or with `--vtop` option to translate some typical addresses (like:
 kernel_link_addr, vmalloc_start, page_offset)

2. start crash with kdump file and do some VTOPs 


A test log: 
---

# With the Sv57 and SPARSE_EXTREME kernel  
# vtop the vmalloc start address -- 0xff20


# /mnt/mkdf_f  --vtop 0xff20 -d31 -f --non-mmap -c /proc/vmcore 
/mnt/dump.file1

Translating virtual address ff20 to physical address.
VIRTUAL   PHYSICAL
ff20  80087000

Copying data  : [100.0 %] |
eta: 0s

The dumpfile is saved to /mnt/dump.file1.

makedumpfile Completed.

# sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux 
/tmp/hello/dump.file1
...
  KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux
DUMPFILE: /tmp/hello/dump.file1  [PARTIAL DUMP]
CPUS: 2
DATE: Wed Sep 27 18:37:45 CST 2023
  UPTIME: 00:00:18
LOAD AVERAGE: 0.00, 0.00, 0.00
   TASKS: 55
NODENAME: (none)
 RELEASE: 6.6.0-rc1-7-g22bfc766389c
 VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023
 MACHINE: riscv64  (unknown Mhz)
  MEMORY: 511.8 MB
   PANIC: "Kernel panic - not syncing: sysrq triggered crash"
 PID: 1
 COMMAND: "sh"
TASK: ff6e  [THREAD_INFO: ff6e]
 CPU: 1
   STATE: TASK_RUNNING (PANIC)

crash> vtop 0xff20
VIRTUAL   PHYSICAL
ff20  80087000

  PGD: 814fa900 => 20010c01
  P4D: 80043000 => 20025401
  PUD: 80095000 => 20025801
  PMD: 80096000 => 20026001
  PTE: 80098000 => 20021ce7
 PAGE: 80087000

  PTE PHYSICAL  FLAGS
20021ce7  80087000  (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY)  

  PAGE   PHYSICAL  MAPPING   INDEX CNT FLAGS
ff1c020021c0 8008700000  1 0  // same as the 
makedumpfile's vtop  

Song Shuai (2):
  Add riscv64 support
  riscv64: Correct the pfn_start for flatmem

 Makefile   |   2 +-
 arch/riscv64.c | 219 +
 makedumpfile.c |  18 
 makedumpfile.h | 107 
 4 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv64.c

-- 
2.20.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH makedumpfile V2 1/2] Add riscv64 support

2023-10-10 Thread Song Shuai
This patch adds support for riscv64 in makedumpfile.
It implements the "vtop" for kenrel memory regions
and supports Sv39/Sv48/Sv57 page modes for RV64.

Signed-off-by: Song Shuai 
---
 Makefile   |   2 +-
 arch/riscv64.c | 219 +
 makedumpfile.c |  14 
 makedumpfile.h | 107 
 4 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv64.c

diff --git a/Makefile b/Makefile
index 0608035..1d0644c 100644
--- a/Makefile
+++ b/Makefile
@@ -47,7 +47,7 @@ endif
 SRC_BASE = makedumpfile.c makedumpfile.h diskdump_mod.h sadump_mod.h 
sadump_info.h
 SRC_PART = print_info.c dwarf_info.c elf_info.c erase_info.c sadump_info.c 
cache.c tools.c printk.c detect_cycle.c
 OBJ_PART=$(patsubst %.c,%.o,$(SRC_PART))
-SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c 
arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c 
arch/loongarch64.c
+SRC_ARCH = arch/arm.c arch/arm64.c arch/x86.c arch/x86_64.c arch/ia64.c 
arch/ppc64.c arch/s390x.c arch/ppc.c arch/sparc64.c arch/mips64.c 
arch/loongarch64.c arch/riscv64.c
 OBJ_ARCH=$(patsubst %.c,%.o,$(SRC_ARCH))
 
 LIBS = -ldw -lbz2 -ldl -lelf -lz
diff --git a/arch/riscv64.c b/arch/riscv64.c
new file mode 100644
index 000..b4101e7
--- /dev/null
+++ b/arch/riscv64.c
@@ -0,0 +1,219 @@
+/*
+ * riscv64.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifdef __riscv64__
+
+#include "../print_info.h"
+#include "../elf_info.h"
+#include "../makedumpfile.h"
+
+int
+get_phys_base_riscv64(void)
+{
+   if (NUMBER(phys_ram_base) != NOT_FOUND_NUMBER)
+   info->phys_base = NUMBER(phys_ram_base);
+   else
+   /* In case that you are using qemu rv64 env */
+   info->phys_base = 0x8020;
+
+   DEBUG_MSG("phys_base: %lx\n", info->phys_base);
+   return TRUE;
+}
+
+int
+get_machdep_info_riscv64(void)
+{
+
+   if(NUMBER(va_bits) == NOT_FOUND_NUMBER ||  NUMBER(page_offset) == 
NOT_FOUND_NUMBER ||
+  NUMBER(vmalloc_start) == NOT_FOUND_NUMBER || NUMBER(vmalloc_end) == 
NOT_FOUND_NUMBER ||
+  NUMBER(vmemmap_start) == NOT_FOUND_NUMBER ||  NUMBER(vmemmap_end) == 
NOT_FOUND_NUMBER ||
+  NUMBER(modules_vaddr) == NOT_FOUND_NUMBER ||  NUMBER(modules_end) == 
NOT_FOUND_NUMBER ||
+  NUMBER(kernel_link_addr) == NOT_FOUND_NUMBER || 
NUMBER(va_kernel_pa_offset) == NOT_FOUND_NUMBER)
+   return FALSE;
+
+   if (NUMBER(MAX_PHYSMEM_BITS) != NOT_FOUND_NUMBER)
+   info->max_physmem_bits = NUMBER(MAX_PHYSMEM_BITS);
+   else
+   info->max_physmem_bits = _MAX_PHYSMEM_BITS;
+
+   if (NUMBER(SECTION_SIZE_BITS) != NOT_FOUND_NUMBER)
+   info->section_size_bits = NUMBER(SECTION_SIZE_BITS);
+   else
+   info->section_size_bits = _SECTION_SIZE_BITS;
+
+   info->page_offset = NUMBER(page_offset);
+
+   DEBUG_MSG("va_bits: %ld\n", NUMBER(va_bits));
+   DEBUG_MSG("page_offset: %lx\n", NUMBER(page_offset));
+   DEBUG_MSG("vmalloc_start: %lx\n", NUMBER(vmalloc_start));
+   DEBUG_MSG("vmalloc_end: %lx\n", NUMBER(vmalloc_end));
+   DEBUG_MSG("vmemmap_start: %lx\n", NUMBER(vmemmap_start));
+   DEBUG_MSG("vmemmap_end: %lx\n", NUMBER(vmemmap_end));
+   DEBUG_MSG("modules_vaddr: %lx\n", NUMBER(modules_vaddr));
+   DEBUG_MSG("modules_end: %lx\n", NUMBER(modules_end));
+   DEBUG_MSG("kernel_link_addr: %lx\n", NUMBER(kernel_link_addr));
+   DEBUG_MSG("va_kernel_pa_offset: %lx\n", 
NUMBER(va_kernel_pa_offset));
+
+   return TRUE;
+}
+
+/*
+ * For direct memory mapping
+ */
+
+#define VTOP(X) ({ 
\
+   ulong _X = X;   
\
+   (_X) >= NUMBER(kernel_link_addr) ? ((_X) - 
(NUMBER(va_kernel_pa_offset))):  \
+   ((_X) - PAGE_OFFSET + (info->phys_base));   
\
+   })
+
+static unsigned long long
+vtop_riscv64(pgd_t * pgd, unsigned long vaddr, long va_bits)
+{
+   unsigned long long paddr = NOT_PADDR;
+   pgd_t *pgda;
+   p4d_t *p4da;
+   pud_t *puda;
+   pmd_t *pmda;
+   pte_t *ptea;
+   ulong pt_val, pt_phys;
+
+#define pgd_index(X) ((va_bits == VA_BITS_SV57) ? pgd_index_l5(X) :\
+   ((va_bits == VA_BITS_SV48) ? pgd_index_l4(X) : pgd_index_l3(X)))
+
+   /* PGD */
+   pgda = 

Re: [PATCH 04/13] x86/kvm: Do not try to disable kvmclock if it was not enabled

2023-10-10 Thread Kuppuswamy Sathyanarayanan



On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote:
> kvm_guest_cpu_offline() tries to disable kvmclock regardless if it is
> present in the VM. It leads to write to a MSR that doesn't exist on some
> configurations, namely in TDX guest:
> 
>   unchecked MSR access error: WRMSR to 0x12 (tried to write 
> 0x)
>   at rIP: 0x8110687c (kvmclock_disable+0x1c/0x30)
> 
> kvmclock enabling is gated by CLOCKSOURCE and CLOCKSOURCE2 KVM paravirt
> features.
> 
> Do not disable kvmclock if it was not enumerated or disabled by user
> from kernel command line.

For the above warning,  check for CLOCKSOURCE and CLOCKSOURCE2
feature is sufficient, right? Do we need to include user/command-line
disable check here?

> 
> Signed-off-by: Kirill A. Shutemov 
> Fixes: c02027b5742b ("x86/kvm: Disable kvmclock on all CPUs on shutdown")
> ---
>  arch/x86/kernel/kvmclock.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
> index fb8f52149be9..cba2e732e53f 100644
> --- a/arch/x86/kernel/kvmclock.c
> +++ b/arch/x86/kernel/kvmclock.c
> @@ -22,7 +22,7 @@
>  #include 
>  #include 
>  
> -static int kvmclock __initdata = 1;
> +static int kvmclock __ro_after_init = 1;
>  static int kvmclock_vsyscall __initdata = 1;
>  static int msr_kvm_system_time __ro_after_init = MSR_KVM_SYSTEM_TIME;
>  static int msr_kvm_wall_clock __ro_after_init = MSR_KVM_WALL_CLOCK;
> @@ -195,7 +195,12 @@ static void kvm_setup_secondary_clock(void)
>  
>  void kvmclock_disable(void)
>  {
> - native_write_msr(msr_kvm_system_time, 0, 0);
> + if (!kvm_para_available() || !kvmclock)
> + return;
> +
> + if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE) ||
> + kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2))
> + native_write_msr(msr_kvm_system_time, 0, 0);
>  }
>  
>  static void __init kvmclock_init_mem(void)

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 03/13] cpu/hotplug, x86/acpi: Disable CPU hotplug for ACPI MADT wakeup

2023-10-10 Thread Kuppuswamy Sathyanarayanan



On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote:
> ACPI MADT doesn't allow to offline CPU after it got woke up.
> 

I think you can use the term "CPU hotplug" instead of just offline.

> Currently hotplug prevented based on the confidential computing
> attribute which is set for Intel TDX. But TDX is not the only possible
> user of the wake up method.
> 
> Mark CPU hotplug as "not supported" on ACPI MADT wakeup enumeration.

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan 


> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/x86/coco/core.c   |  1 -
>  arch/x86/kernel/acpi/madt_wakeup.c |  4 
>  include/linux/cc_platform.h| 10 --
>  kernel/cpu.c   |  2 +-
>  4 files changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..f07c3bb7deab 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -20,7 +20,6 @@ static bool noinstr intel_cc_platform_has(enum cc_attr attr)
>  {
>   switch (attr) {
>   case CC_ATTR_GUEST_UNROLL_STRING_IO:
> - case CC_ATTR_HOTPLUG_DISABLED:
>   case CC_ATTR_GUEST_MEM_ENCRYPT:
>   case CC_ATTR_MEM_ENCRYPT:
>   return true;
> diff --git a/arch/x86/kernel/acpi/madt_wakeup.c 
> b/arch/x86/kernel/acpi/madt_wakeup.c
> index 1b9747bfd5b9..15bdf10b1393 100644
> --- a/arch/x86/kernel/acpi/madt_wakeup.c
> +++ b/arch/x86/kernel/acpi/madt_wakeup.c
> @@ -1,4 +1,5 @@
>  #include 
> +#include 
>  #include 
>  
>  /* Physical address of the Multiprocessor Wakeup Structure mailbox */
> @@ -74,6 +75,9 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers 
> *header,
>  
>   acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
>  
> + /* Disable CPU onlining/offlining */
> + cpu_hotplug_not_supported();
> +
>   apic_update_callback(wakeup_secondary_cpu_64, acpi_wakeup_cpu);
>  
>   return 0;
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index cb0d6cd1c12f..d08dd65b5c43 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -80,16 +80,6 @@ enum cc_attr {
>* using AMD SEV-SNP features.
>*/
>   CC_ATTR_GUEST_SEV_SNP,
> -
> - /**
> -  * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled.
> -  *
> -  * The platform/OS is running as a guest/virtual machine does not
> -  * support CPU hotplug feature.
> -  *
> -  * Examples include TDX Guest.
> -  */
> - CC_ATTR_HOTPLUG_DISABLED,
>  };
>  
>  #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index cf536fe1a88a..9d4279476b40 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1522,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum 
> cpuhp_state target)
>* If the platform does not support hotplug, report it explicitly to
>* differentiate it from a transient offlining failure.
>*/
> - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported)
> + if (!cpu_hotplug_supported)
>   return -EOPNOTSUPP;
>   if (cpu_hotplug_disabled)
>   return -EBUSY;

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 02/13] kernel/cpu: Add support for declaring CPU hotplug not supported

2023-10-10 Thread Kuppuswamy Sathyanarayanan



On 10/5/2023 6:13 AM, Kirill A. Shutemov wrote:
> The function cpu_hotplug_not_supported() can be called to indicate that
> CPU hotplug should be disabled. It does not prevent the initial bring up
> of the CPU, but it stops subsequent offlining.
> 
> This function is intended to replace CC_ATTR_HOTPLUG_DISABLED.
> 

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan 


> Signed-off-by: Kirill A. Shutemov 
> ---
>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c| 17 -
>  2 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index f19f56501809..aab3887cadbc 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -132,6 +132,7 @@ extern void cpus_read_lock(void);
>  extern void cpus_read_unlock(void);
>  extern int  cpus_read_trylock(void);
>  extern void lockdep_assert_cpus_held(void);
> +extern void cpu_hotplug_not_supported(void);
>  extern void cpu_hotplug_disable(void);
>  extern void cpu_hotplug_enable(void);
>  void clear_tasks_mm_cpumask(int cpu);
> @@ -147,6 +148,7 @@ static inline void cpus_read_lock(void) { }
>  static inline void cpus_read_unlock(void) { }
>  static inline int  cpus_read_trylock(void) { return true; }
>  static inline void lockdep_assert_cpus_held(void) { }
> +static inline void cpu_hotplug_not_supported(void) { }
>  static inline void cpu_hotplug_disable(void) { }
>  static inline void cpu_hotplug_enable(void) { }
>  static inline int remove_cpu(unsigned int cpu) { return -EPERM; }
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 6de7c6bb74ee..cf536fe1a88a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -484,6 +484,9 @@ static int cpu_hotplug_disabled;
>  
>  DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock);
>  
> +/* Cleared if platform declares CPU hotplug not supported */
> +static bool cpu_hotplug_supported = true;
> +
>  void cpus_read_lock(void)
>  {
>   percpu_down_read(_hotplug_lock);
> @@ -543,6 +546,18 @@ static void lockdep_release_cpus_lock(void)
>   rwsem_release(_hotplug_lock.dep_map, _THIS_IP_);
>  }
>  
> +/*
> + * Declare CPU hotplug not supported.
> + *
> + * It doesn't prevent initial bring up of the CPU, but stops offlining.
> + */
> +void cpu_hotplug_not_supported(void)
> +{
> + cpu_maps_update_begin();
> + cpu_hotplug_supported = false;
> + cpu_maps_update_done();
> +}

Since this function is not used in this patch, do you need to add 
__maybe_unused to
avoid warnings?

> +
>  /*
>   * Wait for currently running CPU hotplug operations to complete (if any) and
>   * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' 
> protects
> @@ -1507,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum 
> cpuhp_state target)
>* If the platform does not support hotplug, report it explicitly to
>* differentiate it from a transient offlining failure.
>*/
> - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED))
> + if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported)
>   return -EOPNOTSUPP;
>   if (cpu_hotplug_disabled)
>   return -EBUSY;

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [kexec-tools] Archive file is missed iomem.h file under loongarch architecture.

2023-10-10 Thread Simon Horman
On Mon, Oct 09, 2023 at 05:47:43PM +0800, Ming Wang wrote:
> Hi,  maintainers,
> 
> 
> I get the kexec-tools 2.0.27 from 
> http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools-2.0.27.tar.gz, 
> 
> But I noticed that the kexec-tools-2.0.27/kexec/arch/loongarch/iomem.h file 
> was missing from
> 
> this archive.
> 
> 
> This causes build errors in many distributions, like debian.  The error 
> message is as follows,
> 
> make[1]: *** [Makefile:123: kexec/arch/loongarch/crashdump-loongarch.o] Error 
> 1
> kexec/arch/loongarch/kexec-loongarch.c:27:10: fatal error: iomem.h: No such 
> file or directory
>27 | #include "iomem.h"
> 
> See also: https://buildd.debian.org/status/package.php?p=kexec-tools=sid
> 
> 
> Can this archive be repaired and updated?
> 
> 
> Thanks, Ming

Hi,

I need to think about how to deal with this from a release PoV.
But can you check if the patch below resolves your problem?

diff --git a/kexec/arch/loongarch/Makefile b/kexec/arch/loongarch/Makefile
index cee7e569a2a2..f91d0baf049a 100644
--- a/kexec/arch/loongarch/Makefile
+++ b/kexec/arch/loongarch/Makefile
@@ -19,5 +19,6 @@ loongarch_VIRT_TO_PHYS =
 dist += kexec/arch/loongarch/Makefile $(loongarch_KEXEC_SRCS)  
\
kexec/arch/loongarch/kexec-loongarch.h  
\
kexec/arch/loongarch/image-header.h 
\
+   kexec/arch/loongarch/iomem.h
\
kexec/arch/loongarch/crashdump-loongarch.h  
\
kexec/arch/loongarch/include/arch/options.h

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv8 2/5] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

2023-10-10 Thread Hari Bathini




On 09/10/23 5:00 pm, Pingfan Liu wrote:

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: kexec@lists.infradead.org
To: linuxppc-...@lists.ozlabs.org
---
  arch/powerpc/kernel/prom.c | 25 +
  arch/powerpc/kernel/setup-common.c | 87 +++---
  2 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
  unsigned int boot_cpu_node_count __ro_after_init;
  #endif
  static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
  static int __initdata boot_cpu_count;
+#endif
  
  static int __init early_parse_mem(char *p)

  {
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
const __be32 *intserv;
int i, nthreads;
int len;
-   int found = -1;
-   int found_thread = 0;
+   bool found = false;
  
  	/* We are scanning "cpu" nodes only */

if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
-   found = boot_cpu_count;
-   found_thread = i;
+   /*
+* always map the boot-cpu logical id into the
+* range of [0, thread_per_core)
+*/
+   boot_cpuid = i;
+   found = true;
+   /* This works around the hole in paca_ptrs[]. */
+   if (nr_cpu_ids < nthreads)
+   set_nr_cpu_ids(nthreads);
}
  #ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long 
node,
}
  
  	/* Not the boot CPU */

-   if (found < 0)
+   if (!found)
return 0;
  
-	DBG("boot cpu: logical %d physical %d\n", found,

-   be32_to_cpu(intserv[found_thread]));
-   boot_cpuid = found;
+   DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+   be32_to_cpu(intserv[boot_cpuid]));
  
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);

+   boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
  
  	/*

 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..81291e13dec0 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
  
  u32 *cpu_to_phys_id = NULL;
  
+struct interrupt_server_node {

+   struct list_head node;
+   boolavail;
+   int len;
+   __be32 *intserv;
+};
+
  /**
   * setup_cpu_maps - initialize the following cpu maps:
   *  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int cpu = 0;
-   int nthreads = 1;
+   int shift = 0, cpu = 0;
+   int j, nthreads = 1;
+   int len;
+   struct interrupt_server_node *intserv_node, *n;
+   struct list_head *bt_node, head;
+   bool avail, found_boot_cpu = false;
  
  	DBG("smp_setup_cpu_maps()\n");
  
+	INIT_LIST_HEAD();

cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
__alignof__(u32));
if 

Re: [PATCH 03/13] cpu/hotplug, x86/acpi: Disable CPU hotplug for ACPI MADT wakeup

2023-10-10 Thread Huang, Kai

>  /* Physical address of the Multiprocessor Wakeup Structure mailbox */
> @@ -74,6 +75,9 @@ int __init acpi_parse_mp_wake(union acpi_subtable_headers 
> *header,
>  
>   acpi_mp_wake_mailbox_paddr = mp_wake->base_address;
>  
> + /* Disable CPU onlining/offlining */
> + cpu_hotplug_not_supported();
> +

Both onlining/offlining are prevented, or just offlining?

The previous patch says:

It does not prevent the initial bring up of the CPU, but it stops 
subsequent offlining.

And ...

[...]


> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1522,7 +1522,7 @@ static int cpu_down_maps_locked(unsigned int cpu, enum 
> cpuhp_state target)
>* If the platform does not support hotplug, report it explicitly to
>* differentiate it from a transient offlining failure.
>*/
> - if (cc_platform_has(CC_ATTR_HOTPLUG_DISABLED) || !cpu_hotplug_supported)
> + if (!cpu_hotplug_supported)
>   return -EOPNOTSUPP;
>   if (cpu_hotplug_disabled)
>   return -EBUSY;

... here cpu_down_maps_locked() only prevents offlining if I am reading
correctly.

Also, can we rename cpu_hotplug_supported to cpu_offline_supported to match the
behaviour better?

Anyway, isn't it a little bit odd to have:

if (!cpu_hotplug_supported)
return -EOPNOTSUPP;
if (cpu_hotplug_disabled)
return -EBUSY;

?
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 09/13] x86/tdx: Account shared memory

2023-10-10 Thread Huang, Kai


> +#ifdef CONFIG_DEBUG_FS
> +static int tdx_shared_memory_show(struct seq_file *m, void *p)
> +{
> + unsigned long addr, end;
> + unsigned long found = 0;
> +
> + addr = PAGE_OFFSET;
> + end  = PAGE_OFFSET + get_max_mapped();
> +
> + while (addr < end) {
> + unsigned long size;
> + unsigned int level;
> + pte_t *pte;
> +
> + pte = lookup_address(addr, );
> + size = page_level_size(level);
> +
> + if (pte && pte_decrypted(*pte))
> + found += size / PAGE_SIZE;
> +
> + addr += size;

This could be a long loop, perhaps add cond_resched() here?

> + }
> +
> + seq_printf(m, "Number of unshared pages in kernel page tables:  
> %16lu\n",
> +found);
> + seq_printf(m, "Number of pages accounted as unshared:   
> %16ld\n",
> +atomic_long_read(_shared));
> + return 0;
> +}
> +

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32

2023-10-10 Thread Sourabh Jain

Hello Pingfan,



With this patch series applied, the kdump kernel fails to boot on 
powerpc with nr_cpus=1.


Console logs:
---
[root]# echo c > /proc/sysrq-trigger
[   74.783235] sysrq: Trigger a crash
[   74.783244] Kernel panic - not syncing: sysrq triggered crash
[   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 
6.6.0-rc5pf-nr-cpus+ #3

[   74.783259] Hardware name: POWER10 (raw) phyp pSeries
[   74.783275] Call Trace:
[   74.783280] [c0020f4ebac0] [c0ed9f38] 
dump_stack_lvl+0x6c/0x9c (unreliable)

[   74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438
[   74.783298] [c0020f4ebb90] [c0936d48] 
sysrq_handle_crash+0x28/0x30
[   74.783304] [c0020f4ebbf0] [c093773c] 
__handle_sysrq+0x10c/0x250
[   74.783309] [c0020f4ebc90] [c0937fa8] 
write_sysrq_trigger+0xc8/0x168
[   74.783314] [c0020f4ebcd0] [c0665d8c] 
proc_reg_write+0x10c/0x1b0
[   74.783321] [c0020f4ebd00] [c058da54] 
vfs_write+0x104/0x4b0
[   74.783326] [c0020f4ebdc0] [c058dfdc] 
ksys_write+0x7c/0x140
[   74.783331] [c0020f4ebe10] [c0033a64] 
system_call_exception+0x144/0x3a0
[   74.783337] [c0020f4ebe50] [c000c554] 
system_call_common+0xf4/0x258

[   74.783343] --- interrupt: c00 at 0x7fffa0721594
[   74.783352] NIP:  7fffa0721594 LR: 7fffa0697bf4 CTR: 

[   74.783364] REGS: c0020f4ebe80 TRAP: 0c00   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[   74.783376] MSR:  8280f033 
  CR: 2802  XER: 

[   74.783394] IRQMASK: 0
[   74.783394] GPR00: 0004 7c4b6800 
7fffa0807300 0001
[   74.783394] GPR04: 00013549ea60 0002 
0010 
[   74.783394] GPR08:   
 
[   74.783394] GPR12:  7fffa0abaf70 
4000 00011a0f9798
[   74.783394] GPR16: 00011a0f9724 00011a097688 
00011a02ff70 00011a0fd568
[   74.783394] GPR20: 000135554bf0 0001 
00011a0aa478 7c4b6a24
[   74.783394] GPR24: 7c4b6a20 00011a0faf94 
0002 00013549ea60
[   74.783394] GPR28: 0002 7fffa08017a0 
00013549ea60 0002

[   74.783440] NIP [7fffa0721594] 0x7fffa0721594
[   74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4
[   74.783447] --- interrupt: c00
I'm in purgatory
[    0.00] radix-mmu: Page sizes from device-tree:
[    0.00] radix-mmu: Page size shift = 12 AP=0x0
[    0.00] radix-mmu: Page size shift = 16 AP=0x5
[    0.00] radix-mmu: Page size shift = 21 AP=0x1
[    0.00] radix-mmu: Page size shift = 30 AP=0x2
[    0.00] Activating Kernel Userspace Access Prevention
[    0.00] Activating Kernel Userspace Execution Prevention
[    0.00] radix-mmu: Mapped 0x-0x0001 
with 64.0 KiB pages (exec)
[    0.00] radix-mmu: Mapped 0x0001-0x0020 
with 64.0 KiB pages
[    0.00] radix-mmu: Mapped 0x0020-0x2000 
with 2.00 MiB pages
[    0.00] radix-mmu: Mapped 0x2000-0x2260 
with 2.00 MiB pages (exec)
[    0.00] radix-mmu: Mapped 0x2260-0x4000 
with 2.00 MiB pages
[    0.00] radix-mmu: Mapped 0x4000-0x00018000 
with 1.00 GiB pages
[    0.00] radix-mmu: Mapped 0x00018000-0x0001a000 
with 2.00 MiB pages

[    0.00] lpar: Using radix MMU under hypervisor
[    0.00] Linux version 6.6.0-rc5pf-nr-cpus+ 
(r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 
(Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:

41 CDT 2023
[    0.00] Found initrd at 0xc00022e6:0xc000248f08d8
[    0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 
0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries

[    0.00] printk: bootconsole [udbg0] enabled
[    0.00] the round shift between dt seq and the cpu logic 
number: 56
[    0.00] BUG: Unable to handle kernel data access on write at 
0xc001a000

[    0.00] Faulting instruction address: 0xc00022009c64
[    0.00] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[    0.00] Modules linked in:
[    0.00] CPU: 2 PID: 0 Comm: swapper Not tainted 
6.6.0-rc5pf-nr-cpus+ #3

[    0.00] Hardware name:  POWER10 (raw)  hv:phyp pSeries
[    0.00] NIP:  c00022009c64 LR: c00022009c54 CTR: 
c000201ff348
[    0.00] REGS: c00022aebb00 TRAP: 0300   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[    0.00] MSR:  80001033  CR: 
28222824  XER: 0001
[    0.00] CFAR: c00020031574 DAR: c001a000 DSISR: 
4200 IRQMASK: 1
[    0.00] GPR00: c00022009ba0 c00022aebda0 
c000213d1300 

Re: [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus

2023-10-10 Thread Hari Bathini




On 09/10/23 5:00 pm, Pingfan Liu wrote:

If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
ensure the boot_cpu is in cpu_present_mask. This can be achieved by
reserving the last quota for the boot cpu.

Note: the restriction on nr_cpus will be lifted with more effort in the
successive patches

Signed-off-by: Pingfan Liu 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: Mahesh Salgaonkar 
Cc: Wen Xiong 
Cc: Baoquan He 
Cc: Ming Lei 
Cc: kexec@lists.infradead.org
To: linuxppc-...@lists.ozlabs.org
---
  arch/powerpc/kernel/setup-common.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 81291e13dec0..f9ef0a2666b0 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -454,8 +454,8 @@ struct interrupt_server_node {
  void __init smp_setup_cpu_maps(void)
  {
struct device_node *dn;
-   int shift = 0, cpu = 0;
-   int j, nthreads = 1;
+   int terminate, shift = 0, cpu = 0;
+   int j, bt_thread = 0, nthreads = 1;
int len;
struct interrupt_server_node *intserv_node, *n;
struct list_head *bt_node, head;
@@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
for (j = 0 ; j < nthreads; j++) {
if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
bt_node = _node->node;
+   bt_thread = j;
found_boot_cpu = true;
/*
 * Record the round-shift between dt
@@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
list_add_tail(, bt_node);
pr_info("the round shift between dt seq and the cpu logic number: 
%d\n", shift);
+   terminate = nr_cpu_ids;
list_for_each_entry(intserv_node, , node) {
  
+		j = 0;



+   /* Choose a start point to cover the boot cpu */
+   if (nr_cpu_ids - 1 < bt_thread) {
+   /*
+* The processor core puts assumption on the thread id,
+* not to breach the assumption.
+*/
+   terminate = nr_cpu_ids - 1;


nthreads is anyway assumed to be same for all cores. So, enforcing
nr_cpu_ids to a minimum of nthreads (and multiple of nthreads) should
make the code much simpler without the need for above check and the
other complexities addressed in the subsequent patches...

Thanks
Hari

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32

2023-10-10 Thread Sourabh Jain

Hello Pingfan,



With this patch series applied, the kdump kernel fails to boot on 
powerpc with nr_cpus=1.


Console logs:
---
[root]# echo c > /proc/sysrq-trigger
[   74.783235] sysrq: Trigger a crash
[   74.783244] Kernel panic - not syncing: sysrq triggered crash
[   74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not tainted 
6.6.0-rc5pf-nr-cpus+ #3

[   74.783259] Hardware name: POWER10 (raw) phyp pSeries
[   74.783275] Call Trace:
[   74.783280] [c0020f4ebac0] [c0ed9f38] 
dump_stack_lvl+0x6c/0x9c (unreliable)

[   74.783291] [c0020f4ebaf0] [c0150300] panic+0x178/0x438
[   74.783298] [c0020f4ebb90] [c0936d48] 
sysrq_handle_crash+0x28/0x30
[   74.783304] [c0020f4ebbf0] [c093773c] 
__handle_sysrq+0x10c/0x250
[   74.783309] [c0020f4ebc90] [c0937fa8] 
write_sysrq_trigger+0xc8/0x168
[   74.783314] [c0020f4ebcd0] [c0665d8c] 
proc_reg_write+0x10c/0x1b0
[   74.783321] [c0020f4ebd00] [c058da54] 
vfs_write+0x104/0x4b0
[   74.783326] [c0020f4ebdc0] [c058dfdc] 
ksys_write+0x7c/0x140
[   74.783331] [c0020f4ebe10] [c0033a64] 
system_call_exception+0x144/0x3a0
[   74.783337] [c0020f4ebe50] [c000c554] 
system_call_common+0xf4/0x258

[   74.783343] --- interrupt: c00 at 0x7fffa0721594
[   74.783352] NIP:  7fffa0721594 LR: 7fffa0697bf4 CTR: 

[   74.783364] REGS: c0020f4ebe80 TRAP: 0c00   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[   74.783376] MSR:  8280f033 
  CR: 2802  XER: 

[   74.783394] IRQMASK: 0
[   74.783394] GPR00: 0004 7c4b6800 
7fffa0807300 0001
[   74.783394] GPR04: 00013549ea60 0002 
0010 
[   74.783394] GPR08:   
 
[   74.783394] GPR12:  7fffa0abaf70 
4000 00011a0f9798
[   74.783394] GPR16: 00011a0f9724 00011a097688 
00011a02ff70 00011a0fd568
[   74.783394] GPR20: 000135554bf0 0001 
00011a0aa478 7c4b6a24
[   74.783394] GPR24: 7c4b6a20 00011a0faf94 
0002 00013549ea60
[   74.783394] GPR28: 0002 7fffa08017a0 
00013549ea60 0002

[   74.783440] NIP [7fffa0721594] 0x7fffa0721594
[   74.783443] LR [7fffa0697bf4] 0x7fffa0697bf4
[   74.783447] --- interrupt: c00
I'm in purgatory
[    0.00] radix-mmu: Page sizes from device-tree:
[    0.00] radix-mmu: Page size shift = 12 AP=0x0
[    0.00] radix-mmu: Page size shift = 16 AP=0x5
[    0.00] radix-mmu: Page size shift = 21 AP=0x1
[    0.00] radix-mmu: Page size shift = 30 AP=0x2
[    0.00] Activating Kernel Userspace Access Prevention
[    0.00] Activating Kernel Userspace Execution Prevention
[    0.00] radix-mmu: Mapped 0x-0x0001 
with 64.0 KiB pages (exec)
[    0.00] radix-mmu: Mapped 0x0001-0x0020 
with 64.0 KiB pages
[    0.00] radix-mmu: Mapped 0x0020-0x2000 
with 2.00 MiB pages
[    0.00] radix-mmu: Mapped 0x2000-0x2260 
with 2.00 MiB pages (exec)
[    0.00] radix-mmu: Mapped 0x2260-0x4000 
with 2.00 MiB pages
[    0.00] radix-mmu: Mapped 0x4000-0x00018000 
with 1.00 GiB pages
[    0.00] radix-mmu: Mapped 0x00018000-0x0001a000 
with 2.00 MiB pages

[    0.00] lpar: Using radix MMU under hypervisor
[    0.00] Linux version 6.6.0-rc5pf-nr-cpus+ 
(r...@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 20210514 
(Red Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct  9 11:07:

41 CDT 2023
[    0.00] Found initrd at 0xc00022e6:0xc000248f08d8
[    0.00] Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 
0xf06 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries

[    0.00] printk: bootconsole [udbg0] enabled
[    0.00] the round shift between dt seq and the cpu logic 
number: 56
[    0.00] BUG: Unable to handle kernel data access on write at 
0xc001a000

[    0.00] Faulting instruction address: 0xc00022009c64
[    0.00] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.00] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[    0.00] Modules linked in:
[    0.00] CPU: 2 PID: 0 Comm: swapper Not tainted 
6.6.0-rc5pf-nr-cpus+ #3

[    0.00] Hardware name:  POWER10 (raw)  hv:phyp pSeries
[    0.00] NIP:  c00022009c64 LR: c00022009c54 CTR: 
c000201ff348
[    0.00] REGS: c00022aebb00 TRAP: 0300   Not tainted 
(6.6.0-rc5pf-nr-cpus+)
[    0.00] MSR:  80001033  CR: 
28222824  XER: 0001
[    0.00] CFAR: c00020031574 DAR: c001a000 DSISR: 
4200 IRQMASK: 1
[    0.00] GPR00: c00022009ba0 c00022aebda0 
c000213d1300 

Re: [PATCH makedumpfile 0/2] Add riscv64 support for makedumpfile

2023-10-10 Thread 萩尾 一仁
On 2023/10/07 11:27, Song Shuai wrote:
> 
> 
> 在 2023/10/3 12:22, HAGIO KAZUHITO(萩尾 一仁) 写道:
>> Hi,
>>
>> thank you for the patch.
>>
>> On 2023/09/27 20:18, Song Shuai wrote:
>>> These 2 patches add riscv64 support for makedumpfile:
>>>
>>> Patch1 - Add riscv64 support
>>> ===
>>>
>>> This patch adds support for riscv64 in makedumpfile.
>>> It implements the "vtop" for kenrel memory regions
>>> and supports Sv39/Sv48/Sv57 page modes for RV64.
>>
>> Could I have a log of makedumpfile with --message-level 31 option for
>> reference? e.g.
>>     makedumpfile -c -d 31 --message-level 31 vmcore dumpfile > mkdf.log
>>
>> (IIRC the kexec mail list doesn't accept attached files, so please send
>> it off-list.)
> 
> Sorry for the later reply,
> 
> here are the log for the Sv57 and SPARSE_EXTREME kernel:
> 
> https://termbin.com/zcf9:
> 
> and the log for FLATMEM
> 
> https://termbin.com/t89k

Thank you for the information.

> 
>>
>>>
>>>
>>> Patch2 - riscv64: Correct the pfn_start for flatmem
>>> ==
>>>
>>> This patch temporarily fixes a issue of the tests about FLATMEM,
>>> as the commit-msg says:
>>>   To let info->max_mapnr indicte the direct max PFN and then
>>
>> This means "indicate", right?
>>
> Right, would fix it if you're ok with the Patch2.

The patches look good, so applied with fixing it and several indent 
adjustments.

Thanks,
Kazu

> 
>> Thanks,
>> Kazu
>>
>>>   make the kdump header's max_mapnr_64 correct, riscv64 port
>>>   didn't define ARCH_PFN_OFFSET.
>>>   As for FLATMEM type, the pfn region of mem_map_data should
>>>   be adjusted to start from info->phys_base instead of zero.
>>>
>>> Not taking other arches into consideration and test, so I simplely
>>> judge the __riscv64__ instead of ARCH_PFN_OFFSET. Maybe we can 
>>> improve it.
>>>
>>>
>>> Tests
>>> =
>>>
>>> With these 2 patches, the following tests had passed in RV64 Qemu 
>>> virt machine:
>>>
>>> Preparation:
>>> ---
>>>
>>> 1. build kernel with FLATMEM and SPARSE memory models
>>> 2. boot kernel with 3 different page-modes by setting nov4l/nov5l in 
>>> cmdline
>>> 3. panic kernel
>>>
>>> Tests:
>>> -
>>>
>>> 1. create kdump-compressed file via this command
>>>  - `/mnt/mkdf_f -d31 -f -c /proc/vmcore /mnt/dump.file1`
>>>  - or with `--vtop` option to translate some typical addresses 
>>> (like:
>>>    kernel_link_addr, vmalloc_start, page_offset)
>>>
>>> 2. start crash with kdump file and do some VTOPs
>>>
>>>
>>> A test log:
>>> ---
>>>
>>> # With the Sv57 and SPARSE_EXTREME kernel
>>> # vtop the vmalloc start address -- 0xff20
>>>
>>>
>>> # /mnt/mkdf_f  --vtop 0xff20 -d31 -f --non-mmap -c 
>>> /proc/vmcore /mnt/dump.file1
>>>
>>> Translating virtual address ff20 to physical address.
>>> VIRTUAL   PHYSICAL
>>> ff20  80087000
>>>
>>> Copying data  : [100.0 %] |
>>> eta: 0s
>>>
>>> The dumpfile is saved to /mnt/dump.file1.
>>>
>>> makedumpfile Completed.
>>>
>>> # sudo ../crash/crash /home/song/9_linux/linux/00_rv_def/vmlinux 
>>> /tmp/hello/dump.file1
>>> ...
>>>     KERNEL: /home/song/9_linux/linux/00_rv_def/vmlinux
>>>   DUMPFILE: /tmp/hello/dump.file1  [PARTIAL DUMP]
>>>   CPUS: 2
>>>   DATE: Wed Sep 27 18:37:45 CST 2023
>>>     UPTIME: 00:00:18
>>> LOAD AVERAGE: 0.00, 0.00, 0.00
>>>  TASKS: 55
>>>   NODENAME: (none)
>>>    RELEASE: 6.6.0-rc1-7-g22bfc766389c
>>>    VERSION: #1 SMP Mon Sep 25 19:29:05 CST 2023
>>>    MACHINE: riscv64  (unknown Mhz)
>>>     MEMORY: 511.8 MB
>>>  PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>>>    PID: 1
>>>    COMMAND: "sh"
>>>   TASK: ff6e  [THREAD_INFO: ff6e]
>>>    CPU: 1
>>>  STATE: TASK_RUNNING (PANIC)
>>>
>>> crash> vtop 0xff20
>>> VIRTUAL   PHYSICAL
>>> ff20  80087000
>>>
>>>     PGD: 814fa900 => 20010c01
>>>     P4D: 80043000 => 20025401
>>>     PUD: 80095000 => 20025801
>>>     PMD: 80096000 => 20026001
>>>     PTE: 80098000 => 20021ce7
>>>    PAGE: 80087000
>>>
>>>     PTE PHYSICAL  FLAGS
>>> 20021ce7  80087000  (PRESENT|READ|WRITE|GLOBAL|ACCESSED|DIRTY)
>>>
>>>     PAGE   PHYSICAL  MAPPING   INDEX CNT FLAGS
>>> ff1c020021c0 80087000    0    0  1 0  // same as 
>>> the makedumpfile's vtop
>>>
>>>
>>> Song Shuai (2):
>>>     Add riscv64 support
>>>     riscv64: Correct the pfn_start for flatmem
>>>
>>>    Makefile   |   2 +-
>>>    arch/riscv64.c | 219 
>>> +
>>>    makedumpfile.c |  18 
>>>    makedumpfile.h | 107 
>>>    4 files changed, 345 insertions(+), 1 deletion(-)
>>>    create mode 100644 arch/riscv64.c
>