Re: [PATCH] kexec: do syscore_shutdown() in kernel_kexec
On Tue, 2023-12-19 at 12:22 +0800, Baoquan He wrote: > Add Andrew to CC as Andrew helps to pick kexec/kdump patches. Ah, thanks, I didn't realise that Andrew pulls in the kexec patches. > > On 12/13/23 at 08:40am, James Gowans wrote: > .. > > This has been tested by doing a kexec on x86_64 and aarch64. > > Hi James, > > Thanks for this great patch. My colleagues have opened bug in rhel to > track this and try to veryfy this patch. However, they can't reproduce > the issue this patch is fixing. Could you tell more about where and how > to reproduce so that we can be aware of it better? Thanks in advance. Sure! The TL;DR is: run a VMX (Intel x86) KVM VM on Linux v6.4+ and do a kexec while the KVM VM is still running. Before this patch the system will triple fault. In more detail: Run a bare metal host on a modern Intel CPU with VMX support. The kernel I was using was 6.7.0-rc5+. You can totally do this with a QEMU "host" as well, btw, that's how I did the debugging and attached GDB to it to figure out what was up. If you want a virtual "host" launch with: -cpu host -M q35,kernel-irqchip=split,accel=kvm -enable-kvm Launch a KVM guest VM, eg: qemu-system-x86_64 \ -enable-kvm \ -cdrom alpine-virt-3.19.0-x86_64.iso \ -nodefaults -nographic -M q35 \ -serial mon:stdio While the guest VM is *still running* do a kexec on the host, eg: kexec -l --reuse-cmdline --initrd=config-6.7.0-rc5+ vmlinuz-6.7.0-rc5+ && \ kexec -e The kexec can be to anything, but I generally just kexec to the same kernel/ramdisk as is currently running. Ie: same-version kexec. Before this patch the kexec will get stuck, after this the kexec will go smoothly and the system will end up in the new kernel in a few seconds. I hope those steps are clear and you can repro this? BTW, the reason that it's important for the KVM VM to still be running when the host does the kexec is because KVM internally maintains a usage counter and will disable virtualisation once all VMs have been terminated, via: __fput(kvm_fd) kvm_vm_release kvm_destroy_vm hardware_disable_all hardware_disable_all_nolock kvm_usage_count--; if (!kvm_usage_count) on_each_cpu(hardware_disable_nolock, NULL, 1); So if all KVM fds are closed then kexec will work because VMXE is cleared on all CPUs when the last VM is destroyed. If the KVM fds are still open (ie: QEMU process still exists) then the issue manifests. It sounds nasty to do a kexec while QEMU processes are still around but this is a perfectly normal flow for live update: 1. Pause and Serialise VM state 2. kexec 3. deserialise and resume VMs. In that flow there's no need to actually kill the QEMU process, as long as the VM is *paused* and has been serialised we can happily kexec. JG ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs
When an error is detected, use pr_err() instead of kexec_dprintk() to output log message. In addition, remove the unnecessary return from set_page_address(). Signed-off-by: Yuntao Wang --- arch/x86/kernel/kexec-bzimage64.c | 2 +- mm/highmem.c | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index e9ae0eac6bf9..4a77d5dd4bce 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -429,7 +429,7 @@ static void *bzImage64_load(struct kimage *image, char *kernel, * command line. Make sure it does not overflow */ if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { - kexec_dprintk("Appending elfcorehdr= to command line exceeds maximum allowed length\n"); + pr_err("Appending elfcorehdr= to command line exceeds maximum allowed length\n"); return ERR_PTR(-EINVAL); } diff --git a/mm/highmem.c b/mm/highmem.c index e19269093a93..bd48ba445dd4 100644 --- a/mm/highmem.c +++ b/mm/highmem.c @@ -799,8 +799,6 @@ void set_page_address(struct page *page, void *virtual) } spin_unlock_irqrestore(>lock, flags); } - - return; } void __init page_address_init(void) -- 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Documentation: kdump: Clarify the default size of memory reserved by crashkernel low
On 12/18/23 at 11:40am, Pingfan Liu wrote: > From: Pingfan Liu > > The default size reserved for crashkernel=,low is decided by the macro > DEFAULT_CRASH_KERNEL_LOW_SIZE, which is based on arch. > > Signed-off-by: Pingfan Liu > Cc: Baoquan He > Cc: Dave Young > To: kexec@lists.infradead.org > To: linux-...@vger.kernel.org > --- > Documentation/admin-guide/kdump/kdump.rst | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/kdump/kdump.rst > b/Documentation/admin-guide/kdump/kdump.rst > index 5762e7477a0c..a021ff155012 100644 > --- a/Documentation/admin-guide/kdump/kdump.rst > +++ b/Documentation/admin-guide/kdump/kdump.rst > @@ -310,8 +310,9 @@ crashkernel syntax > region above 4G, low memory under 4G is needed in this case. There are > three ways to get low memory: > > - 1) Kernel will allocate at least 256M memory below 4G automatically > - if crashkernel=Y,low is not specified. > + 1) Kernel will allocate a chunk of default size memory, which is based > on > + architecture, below 4G automatically if crashkernel=Y,low is not > + specified. Good catch, thx. Acked-by: Baoquan He >2) Let user specify low memory size instead. >3) Specified value 0 will disable low memory allocation:: > > -- > 2.31.1 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: avoid out of bounds in crash_exclude_mem_range()
On Tue, 19 Dec 2023 11:50:32 +0800, fuqiang wang wrote: > 在 2023/12/19 10:47, Yuntao Wang 写道: > > > Hi fuqiang, > > > > Yesterday, I posted two patches that happen to address the bugs you an > > Baoquan > > are currently discussing here, I wasn't aware that you both were also > > working > > on fixing these issues. > > > > Baoquan suggested I talk to you about it. > > > > If you're interested, you can take a look at my patches and review them to > > see > > if there are any issues. If everything is fine, and if you're willing, you > > can > > also add a 'Reviewed-by' tag there. > > > > The following link is for the two patches I posted yesterday: > > > > https://lore.kernel.org/lkml/20231218081915.24120-3-ytco...@gmail.com/t/#u > > > > Sincerely, > > Yuntao > > Hi Yuntao, > > I'm glad you've also noticed this issue. But I'm sorry, I want to solve this > problem myself because this is my first time posting a patch in the community, > and I cherish this opportunity very much. I can truly understand your feelings because I still remember how thrilled I was when my first patch got merged. So keep it up! > > I have carefully reviewed your patch. There is some changes where my views > differ > from yours: > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index c92d88680dbf..3be46f4b441e 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -282,10 +282,6 @@ int crash_setup_memmap_entries(struct kimage *image, > struct boot_params *params) > struct crash_memmap_data cmd; > struct crash_mem *cmem; > > -cmem = vzalloc(struct_size(cmem, ranges, 1)); > -if (!cmem) > -return -ENOMEM; > - > memset(, 0, sizeof(struct crash_memmap_data)); > cmd.params = params; > > @@ -321,6 +317,11 @@ int crash_setup_memmap_entries(struct kimage *image, > struct boot_params *params) > } > > /* Exclude some ranges from crashk_res and add rest to memmap */ > +cmem = vzalloc(struct_size(cmem, ranges, 1)); > +if (!cmem) > +return -ENOMEM; > +cmem->max_nr_ranges = 1; > + > ret = memmap_exclude_ranges(image, cmem, crashk_res.start, > crashk_res.end); > if (ret) > goto out; > > 1. I don't feel very good that you have moved vzalloc() to in front of > memmap_exclude_ranges. Because if memory allocation fails, there is no need to > do anything else afterwards. I moved it here because only memmap_exclude_ranges() and the code below it use cmem. I think it is a good practice to put related code together, which also improves code readability. > > 2. The cmem->max_nr_ranges should be set to 2. Because in > memmap_exclude_ranges, a cmem->ranges[] will be filled in and if a split > occurs > later, another one will be added. With the current code, image->elf_load_addr should be equal to crashk_res.start, so split will not occur in crash_exclude_mem_range(). Therefore, setting cmem->max_nr_ranges to 1 is safe. > > Thanks > fuqiang ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
On Tue, 19 Dec 2023 11:32:02 +0800, Baoquan He wrote: > Hi Yuntao, > > On 12/19/23 at 10:02am, Yuntao Wang wrote: > > On Mon, 18 Dec 2023 09:29:02 -0800, Andrew Morton > > wrote: > > > > > On Mon, 18 Dec 2023 16:19:15 +0800 Yuntao Wang wrote: > > > > > > > mem->nr_ranges represents the current number of elements stored in > > > > the mem->ranges array, and mem->max_nr_ranges represents the maximum > > > > number > > > > of elements that the mem->ranges array can hold. Therefore, the correct > > > > array out-of-bounds check should be mem->nr_ranges >= > > > > mem->max_nr_ranges. > > > > > > > > > > This does not apply after your own "crash_core: fix and simplify the > > > logic of crash_exclude_mem_range()". What should be done? > > > > Hi Andrew, > > > > I actually prefer the "crash_core: fix and simplify the logic of > > crash_exclude_mem_range()" patch as it makes the final code more concise and > > clear, and less prone to errors. > > > > The current code is too strange, I guess no one can understand why there is > > a break in the for loop when they read this code for the first time. > > > > Moreover, I think the current code is too fragile, it relies on callers > > using > > this function correctly to ensure its correctness, rather than being able to > > guarantee the correctness on its own. I even feel that this function is very > > likely to have bugs again as the code evolves. > > > > However, Baoquan also has his own considerations, he suggests keeping the > > code > > as it is. > > > > The link below is our detailed discussion on this issue: > > There's misunderstanding here. > > Firstly I said I have concern about the patch, I didn't NACK or reject the > patch. > > [PATCH 3/3] crash_core: fix and simplify the logic of > crash_exclude_mem_range() > > Usually, when people said he/she had concern, you may need to > investigate and resolve it or explain why it's not need be cared about. > > E.g on above [PATCH 3/3], we can add below code change to stop scanning > when the left ranges are all above the excluded range, assume the passed > in cmem has a ascending order of ranges. Say so because I checked code > and found that crash_exclude_mem_range() is called in arch arm64, ppc, > riscv and x86. Among them, arm64 and ppc create the cmem from memblock, > riscv and x86 create cmem from iomem. All of them should be in ascending > ordr. The below code change based on your patch 3/3 looks safe to me. > What do you think? > > diff --git a/kernel/crash_core.c b/kernel/crash_core.c > index aab342c2a5ee..39b6c149dc80 100644 > --- a/kernel/crash_core.c > +++ b/kernel/crash_core.c > @@ -574,9 +574,12 @@ int crash_exclude_mem_range(struct crash_mem *mem, > p_start = mstart; > p_end = mend; > > - if (p_start > end || p_end < start) > + if (p_start > end) > continue; > > + if (p_end < start) > + break; > + > /* Truncate any area outside of range */ > if (p_start < start) > p_start = start; > > Secondly, I welcome people who are interested kexec/kdump code, and raise > issues or post patches to fix bug, clean up code. I like these patches. > They can help improve kexec/kdump code and solve problem in advance. > I would like to review and make the patches acceptable and merged > inally. And I also hope people can follow the later issue reported by > other people or LKP if their merged patch caused that. > > Lastly, people are encouraged to help review other people's patch > and give suggestes to improve the code change. If patch author don't > respond for a long while or the work has been suspended for long time, we > can add comment to tell and take over the work to continue. > > These are my personal understanding and thought about kexec/kdump patch > reviewing and maintance. So cheer up. > > > > > https://lore.kernel.org/lkml/20231214163842.129139-3-ytco...@gmail.com/t/#mfd78a97e16251bcb190b0957a0b6cb4b0a096b54 > > > > The final decision on whether to apply that patch is up to you and Baoquan, > > if > > you choose to apply that patch, this patch can be ignored. But if you > > decide not > > to apply that patch, then this patch must be applied, as it fixes a bug in > > the > > crash_exclude_mem_range() function. > > > > Sincerely, > > Yuntao Hi Baoquan, I must clarify that I was not complaining about you. On the contrary, I am grateful to everyone who takes time to review code for others, because I know it is a lot of work. I'm relatively new to the Linux community and still learning the various rules of the community. I'm very sorry that I didn't fully grasp your previous intention. Regarding the method you suggested to add a 'break', I did consider it initially but later decided against it because the memory ranges obtained from iomem may overlap, so I chose a safer way instead. Finally, I would like to
Re: [PATCH] kexec: do syscore_shutdown() in kernel_kexec
Add Andrew to CC as Andrew helps to pick kexec/kdump patches. On 12/13/23 at 08:40am, James Gowans wrote: .. > This has been tested by doing a kexec on x86_64 and aarch64. Hi James, Thanks for this great patch. My colleagues have opened bug in rhel to track this and try to veryfy this patch. However, they can't reproduce the issue this patch is fixing. Could you tell more about where and how to reproduce so that we can be aware of it better? Thanks in advance. Thanks Baoquan > > Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook > restart/shutdown") > > Signed-off-by: James Gowans > Cc: Eric Biederman > Cc: Paolo Bonzini > Cc: Sean Christopherson > Cc: Marc Zyngier > Cc: Arnd Bergmann > Cc: Tony Luck > Cc: Borislav Petkov > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Chen-Yu Tsai > Cc: Jernej Skrabec > Cc: Samuel Holland > Cc: Pavel Machek > Cc: Sebastian Reichel > Cc: Orson Zhai > Cc: Alexander Graf > Cc: Jan H. Schoenherr > --- > kernel/kexec_core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index be5642a4ec49..b926c4db8a91 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -1254,6 +1254,7 @@ int kernel_kexec(void) > kexec_in_progress = true; > kernel_restart_prepare("kexec reboot"); > migrate_to_reboot_cpu(); > + syscore_shutdown(); > > /* >* migrate_to_reboot_cpu() disables CPU hotplug assuming that > -- > 2.34.1 > > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: avoid out of bounds in crash_exclude_mem_range()
在 2023/12/19 10:47, Yuntao Wang 写道: Hi fuqiang, Yesterday, I posted two patches that happen to address the bugs you an Baoquan are currently discussing here, I wasn't aware that you both were also working on fixing these issues. Baoquan suggested I talk to you about it. If you're interested, you can take a look at my patches and review them to see if there are any issues. If everything is fine, and if you're willing, you can also add a 'Reviewed-by' tag there. The following link is for the two patches I posted yesterday: https://lore.kernel.org/lkml/20231218081915.24120-3-ytco...@gmail.com/t/#u Sincerely, Yuntao Hi Yuntao, I'm glad you've also noticed this issue. But I'm sorry, I want to solve this problem myself because this is my first time posting a patch in the community, and I cherish this opportunity very much. I have carefully reviewed your patch. There is some changes where my views differ from yours: diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c92d88680dbf..3be46f4b441e 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -282,10 +282,6 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) struct crash_memmap_data cmd; struct crash_mem *cmem; - cmem = vzalloc(struct_size(cmem, ranges, 1)); - if (!cmem) - return -ENOMEM; - memset(, 0, sizeof(struct crash_memmap_data)); cmd.params = params; @@ -321,6 +317,11 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) } /* Exclude some ranges from crashk_res and add rest to memmap */ + cmem = vzalloc(struct_size(cmem, ranges, 1)); + if (!cmem) + return -ENOMEM; + cmem->max_nr_ranges = 1; + ret = memmap_exclude_ranges(image, cmem, crashk_res.start, crashk_res.end); if (ret) goto out; 1. I don't feel very good that you have moved vzalloc() to in front of memmap_exclude_ranges. Because if memory allocation fails, there is no need to do anything else afterwards. 2. The cmem->max_nr_ranges should be set to 2. Because in memmap_exclude_ranges, a cmem->ranges[] will be filled in and if a split occurs later, another one will be added. Thanks fuqiang ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
Hi Yuntao, On 12/19/23 at 10:02am, Yuntao Wang wrote: > On Mon, 18 Dec 2023 09:29:02 -0800, Andrew Morton > wrote: > > > On Mon, 18 Dec 2023 16:19:15 +0800 Yuntao Wang wrote: > > > > > mem->nr_ranges represents the current number of elements stored in > > > the mem->ranges array, and mem->max_nr_ranges represents the maximum > > > number > > > of elements that the mem->ranges array can hold. Therefore, the correct > > > array out-of-bounds check should be mem->nr_ranges >= mem->max_nr_ranges. > > > > > > > This does not apply after your own "crash_core: fix and simplify the > > logic of crash_exclude_mem_range()". What should be done? > > Hi Andrew, > > I actually prefer the "crash_core: fix and simplify the logic of > crash_exclude_mem_range()" patch as it makes the final code more concise and > clear, and less prone to errors. > > The current code is too strange, I guess no one can understand why there is > a break in the for loop when they read this code for the first time. > > Moreover, I think the current code is too fragile, it relies on callers using > this function correctly to ensure its correctness, rather than being able to > guarantee the correctness on its own. I even feel that this function is very > likely to have bugs again as the code evolves. > > However, Baoquan also has his own considerations, he suggests keeping the code > as it is. > > The link below is our detailed discussion on this issue: There's misunderstanding here. Firstly I said I have concern about the patch, I didn't NACK or reject the patch. [PATCH 3/3] crash_core: fix and simplify the logic of crash_exclude_mem_range() Usually, when people said he/she had concern, you may need to investigate and resolve it or explain why it's not need be cared about. E.g on above [PATCH 3/3], we can add below code change to stop scanning when the left ranges are all above the excluded range, assume the passed in cmem has a ascending order of ranges. Say so because I checked code and found that crash_exclude_mem_range() is called in arch arm64, ppc, riscv and x86. Among them, arm64 and ppc create the cmem from memblock, riscv and x86 create cmem from iomem. All of them should be in ascending ordr. The below code change based on your patch 3/3 looks safe to me. What do you think? diff --git a/kernel/crash_core.c b/kernel/crash_core.c index aab342c2a5ee..39b6c149dc80 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -574,9 +574,12 @@ int crash_exclude_mem_range(struct crash_mem *mem, p_start = mstart; p_end = mend; - if (p_start > end || p_end < start) + if (p_start > end) continue; + if (p_end < start) + break; + /* Truncate any area outside of range */ if (p_start < start) p_start = start; Secondly, I welcome people who are interested kexec/kdump code, and raise issues or post patches to fix bug, clean up code. I like these patches. They can help improve kexec/kdump code and solve problem in advance. I would like to review and make the patches acceptable and merged inally. And I also hope people can follow the later issue reported by other people or LKP if their merged patch caused that. Lastly, people are encouraged to help review other people's patch and give suggestes to improve the code change. If patch author don't respond for a long while or the work has been suspended for long time, we can add comment to tell and take over the work to continue. These are my personal understanding and thought about kexec/kdump patch reviewing and maintance. So cheer up. > > https://lore.kernel.org/lkml/20231214163842.129139-3-ytco...@gmail.com/t/#mfd78a97e16251bcb190b0957a0b6cb4b0a096b54 > > The final decision on whether to apply that patch is up to you and Baoquan, if > you choose to apply that patch, this patch can be ignored. But if you decide > not > to apply that patch, then this patch must be applied, as it fixes a bug in the > crash_exclude_mem_range() function. > > Sincerely, > Yuntao > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: avoid out of bounds in crash_exclude_mem_range()
Hi fuqiang, Yesterday, I posted two patches that happen to address the bugs you an Baoquan are currently discussing here, I wasn't aware that you both were also working on fixing these issues. Baoquan suggested I talk to you about it. If you're interested, you can take a look at my patches and review them to see if there are any issues. If everything is fine, and if you're willing, you can also add a 'Reviewed-by' tag there. The following link is for the two patches I posted yesterday: https://lore.kernel.org/lkml/20231218081915.24120-3-ytco...@gmail.com/t/#u Sincerely, Yuntao ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: avoid out of bounds in crash_exclude_mem_range()
Hi fuqiang, Yesterday, I posted two patches that happen to address the bugs you an Baoquan are currently discussing here, I wasn't aware that you both were also working on fixing these issues. Baoquan suggested I talk to you about it. If you're interested, you can take a look at my patches and review them to see if there are any issues. If everything is fine, and if you're willing, you can also add a 'Reviewed-by' tag there. Sincerely, Yuntao ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
On Mon, 18 Dec 2023 09:29:02 -0800, Andrew Morton wrote: > On Mon, 18 Dec 2023 16:19:15 +0800 Yuntao Wang wrote: > > > mem->nr_ranges represents the current number of elements stored in > > the mem->ranges array, and mem->max_nr_ranges represents the maximum number > > of elements that the mem->ranges array can hold. Therefore, the correct > > array out-of-bounds check should be mem->nr_ranges >= mem->max_nr_ranges. > > > > This does not apply after your own "crash_core: fix and simplify the > logic of crash_exclude_mem_range()". What should be done? Hi Andrew, I actually prefer the "crash_core: fix and simplify the logic of crash_exclude_mem_range()" patch as it makes the final code more concise and clear, and less prone to errors. The current code is too strange, I guess no one can understand why there is a break in the for loop when they read this code for the first time. Moreover, I think the current code is too fragile, it relies on callers using this function correctly to ensure its correctness, rather than being able to guarantee the correctness on its own. I even feel that this function is very likely to have bugs again as the code evolves. However, Baoquan also has his own considerations, he suggests keeping the code as it is. The link below is our detailed discussion on this issue: https://lore.kernel.org/lkml/20231214163842.129139-3-ytco...@gmail.com/t/#mfd78a97e16251bcb190b0957a0b6cb4b0a096b54 The final decision on whether to apply that patch is up to you and Baoquan, if you choose to apply that patch, this patch can be ignored. But if you decide not to apply that patch, then this patch must be applied, as it fixes a bug in the crash_exclude_mem_range() function. Sincerely, Yuntao ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 06/15] arm64: Add KHO support
Hey Rob! On 14.12.23 23:36, Rob Herring wrote: On Wed, Dec 13, 2023 at 12:04:43AM +, Alexander Graf wrote: We now have all bits in place to support KHO kexecs. This patch adds awareness of KHO in the kexec file as well as boot path for arm64 and adds the respective kconfig option to the architecture so that it can use KHO successfully. Signed-off-by: Alexander Graf --- arch/arm64/Kconfig| 12 arch/arm64/kernel/setup.c | 2 ++ arch/arm64/mm/init.c | 8 drivers/of/fdt.c | 41 +++ drivers/of/kexec.c| 36 ++ 5 files changed, 99 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..1ba338ce7598 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1501,6 +1501,18 @@ config ARCH_SUPPORTS_CRASH_DUMP config ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION def_bool CRASH_CORE +config KEXEC_KHO + bool "kexec handover" + depends on KEXEC + select MEMBLOCK_SCRATCH + select LIBFDT + select CMA + help + Allow kexec to hand over state across kernels by generating and + passing additional metadata to the target kernel. This is useful + to keep data or state alive across the kexec. For this to work, + both source and target kernels need to have this option enabled. Why do we have the same kconfig entry twice? Here and x86. This was how the kexec config options were done when I wrote the patches originally. Since then, looks like Eric DeVolder has cleaned up things quite nicely. I'll adapt the new way. + config TRANS_TABLE def_bool y depends on HIBERNATION || KEXEC_CORE diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 417a8a86b2db..8035b673d96d 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -346,6 +346,8 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) paging_init(); + kho_reserve_mem(); + acpi_table_upgrade(); /* Parse the ACPI tables for possible boot-time configuration */ diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 74c1db8ce271..254d82f3383a 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -358,6 +358,8 @@ void __init bootmem_init(void) */ arch_reserve_crashkernel(); + kho_reserve(); + reserve what? It is not obvious what the difference between kho_reserve_mem() and kho_reserve() are. Yeah, I agree. I was struggling to find good names for them. What they do is: kho_reserve() - Reserve CMA memory for later kexec. We use this memory region as scratch memory later. kho_reserve_mem() - Post-KHO. Creates memory reservations inside memblocks for pre-KHO handed over memory. For v2, I'll change them to kho_reserve_scratch() and kho_reserve_previous_mem() unless you have better ideas :) memblock_dump_all(); } @@ -386,6 +388,12 @@ void __init mem_init(void) /* this will put all unused low memory onto the freelists */ memblock_free_all(); + /* + * Now that all KHO pages are marked as reserved, let's flip them back + * to normal pages with accurate refcount. + */ + kho_populate_refcount(); + /* * Check boundaries twice: Some fundamental inconsistencies can be * detected at build time already. diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index bf502ba8da95..af95139351ed 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -1006,6 +1006,44 @@ void __init early_init_dt_check_for_usable_mem_range(void) memblock_add(rgn[i].base, rgn[i].size); } +/** + * early_init_dt_check_kho - Decode info required for kexec handover from DT + */ +void __init early_init_dt_check_kho(void) +{ +#ifdef CONFIG_KEXEC_KHO if (!IS_ENABLED(CONFIG_KEXEC_KHO)) return; You'll need a kho_populate() stub. Always happy to remove #ifdefs :) + unsigned long node = chosen_node_offset; + u64 kho_start, scratch_start, scratch_size, mem_start, mem_size; + const __be32 *p; + int l; + + if ((long)node < 0) + return; + + p = of_get_flat_dt_prop(node, "linux,kho-dt", ); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + kho_start = dt_mem_next_cell(dt_root_addr_cells, ); + + p = of_get_flat_dt_prop(node, "linux,kho-scratch", ); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + scratch_start = dt_mem_next_cell(dt_root_addr_cells, ); + scratch_size = dt_mem_next_cell(dt_root_addr_cells, ); + + p = of_get_flat_dt_prop(node, "linux,kho-mem", ); + if (l != (dt_root_addr_cells + dt_root_size_cells) * sizeof(__be32)) + return; + + mem_start = dt_mem_next_cell(dt_root_addr_cells, ); + mem_size = dt_mem_next_cell(dt_root_addr_cells, ); + + kho_populate(kho_start,
Re: [PATCH v5 3/3] x86/kexec: use pr_err() instead of pr_debug() when an error occurs
On Sun, 17 Dec 2023 11:35:28 +0800 Yuntao Wang wrote: > When an error is detected, use pr_err() instead of pr_debug() to output > log message. > > In addition, remove the unnecessary return from set_page_address(). > > ... > > --- a/arch/x86/kernel/kexec-bzimage64.c > +++ b/arch/x86/kernel/kexec-bzimage64.c > @@ -424,7 +424,7 @@ static void *bzImage64_load(struct kimage *image, char > *kernel, >* command line. Make sure it does not overflow >*/ > if (cmdline_len + MAX_ELFCOREHDR_STR_LEN > header->cmdline_size) { > - pr_debug("Appending elfcorehdr= to command line exceeds > maximum allowed length\n"); > + pr_err("Appending elfcorehdr= to command line exceeds > maximum allowed length\n"); > return ERR_PTR(-EINVAL); > } https://lkml.kernel.org/r/20231213055747.61826-4-...@redhat.com has already changed this to call kexec_dprintk(). I'll skip this patch. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
On Mon, 18 Dec 2023 16:19:15 +0800 Yuntao Wang wrote: > mem->nr_ranges represents the current number of elements stored in > the mem->ranges array, and mem->max_nr_ranges represents the maximum number > of elements that the mem->ranges array can hold. Therefore, the correct > array out-of-bounds check should be mem->nr_ranges >= mem->max_nr_ranges. > This does not apply after your own "crash_core: fix and simplify the logic of crash_exclude_mem_range()". What should be done? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[ANNOUNCE] kexec-tools v2.0.28 preparation
Hi all, I am planning to release kexec-tools v2.0.28 in the next two weeks to roughly coincide with the release of the v6.7 kernel. I would like to ask interested parties to send any patches they would like included in v2.0.28 within one week so they can be considered for inclusion in an rc release. For reference the patches queued up since v2.0.27 are as follows. Thanks to everyone who has contributed to kexec-tools! 549466430ae6 LoongArch: Load vmlinux.efi to the link address ba0ac0efe299 LoongArch: Fix an issue with relocatable vmlinux 74dfaefd6316 m68k: fix getrandom() use with uclibc 22dcf5cb940a lzma: Relax memory limit for lzma decompressor 44e7b73c331f kexec: ppc64: print help to stdout instead of stderr 74d66d405f30 workflow: update to Ubuntu 22.04 ab3a70af8567 kexec/loongarch64: fix 'make dist' file loss issue 6419b008fde7 kexec: provide a memfd_create() wrapper if not present in libc 118b567ce74a crashdump/x86: set the elfcorehdr segment size for hotplug d59d17f37239 crashdump/x86: identify elfcorehdr segment for hotplug a56376080a93 crashdump: exclude elfcorehdr segment from digest for hotplug 75ac71fd94ff crashdump: setup general hotplug support d6cfd2984844 crashdump: introduce the hotplug command line options c36d3e8b2e99 kexec: define KEXEC_UPDATE_ELFCOREHDR bd0200c47c45 kexec: update manpage with explicit mention of clean kexec 2495ccfc5206 zboot: add loongarch kexec_load support 8f08e3b51f25 zboot: enable arm64 kexec_load for zboot image c3f35ff06e54 build: fix tarball creation 056c179cd3c2 kexec-tools 2.0.27.git ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC 0/3] kdump: Check mem_map of CMA area in kdump
On Mon 18-12-23 13:23:22, Pingfan Liu wrote: > From: Pingfan Liu > > > First of all, this series is only for proof of concept. It only passes > compilation. > > For years, CMA is proposed to be used as crashkernel reserved memory. > But DIO prevent us to follow it since DMA may be in-flight and ruin the > kdump kernel. > > This series exports the crash kernel's CMA area information through > device-tree, and kdump kernel skips any page, which refcnt!=mapcount and > has a potential DMA activity. I didn't have time to look deeper into implementation (and I will get back to it only early Jan) but mapcount based checks are really tricky and unreliable. folio_maybe_dma_pinned sounds like a better test. You definitely want to have that checked by more MM people and CC linux-mm. > The exported information include: > u64 kdump_cma_pfn; > u64 kdump_cma_pg_cnt; > u64 kdump_cma_pg_paddr; > > And they should be filled with Jiri's series "[PATCH 0/4] kdump: > crashkernel reservation from CMA" > > After the conjunction of two series, the CMA used for kdump has only the > following risk, where the following conditions: > -1.a wrong code forges _refcnt and mapcount to the same value > -2.the page is also used by DIO > > > Is it acceptable, or any rescue e.g. CRC on page? We alredy do have vm_debug=P which enables init time poisoning on all struct pages. The value is then checked when the page is allocated. > Please share your thoughts. Having a sanity check on exported cma pages makes some sense to me. The exact check might be more involved with false positives but they shouldn't be a major problem unless there are too many of them. -- Michal Hocko SUSE Labs ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 0/2] crash: fix potential cmem->ranges array overflow
Hi Yuntao, On 12/18/23 at 04:19pm, Yuntao Wang wrote: > This series tries to fix the potential cmem->ranges array overflow. This series looks good to me. While you'd better talk to fuqiang to ask if he wants to post these or wants to give up. He posted patch to raise the potention issue and I suggested him to do these during the discussion. Without consulting him for opinion to take over a discussing work, it's not suggested, I would say. https://lore.kernel.org/all/ZXrY7QbXAlxydsSC@MiWiFi-R3L-srv/T/#u > > Yuntao Wang (2): > x86/crash: fix potential cmem->ranges array overflow > crash_core: fix out-of-bounds access check in > crash_exclude_mem_range() > > arch/x86/kernel/crash.c | 9 + > kernel/crash_core.c | 2 +- > 2 files changed, 6 insertions(+), 5 deletions(-) > > -- > 2.43.0 > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: do syscore_shutdown() in kernel_kexec
Hi Eric, On Wed, 2023-12-13 at 10:39 -0600, Eric W. Biederman wrote: > > James Gowans writes: > > > syscore_shutdown() runs driver and module callbacks to get the system > > into a state where it can be correctly shut down. In commit > > 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > > disable_nonboot_cpus()") > > syscore_shutdown() was removed from kernel_restart_prepare() and hence > > got (incorrectly?) removed from the kexec flow. This was innocuous until > > commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to > > hook restart/shutdown") > > changed the way that KVM registered its shutdown callbacks, switching from > > reboot notifiers to syscore_ops.shutdown. As syscore_shutdown() is > > missing from kexec, KVM's shutdown hook is not run and virtualisation is > > left enabled on the boot CPU which results in triple faults when > > switching to the new kernel on Intel x86 VT-x with VMXE enabled. > > > > Fix this by adding syscore_shutdown() to the kexec sequence. In terms of > > where to add it, it is being added after migrating the kexec task to the > > boot CPU, but before APs are shut down. It is not totally clear if this > > is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call > > syscore_shutdown() after disable_nonboot_cpus()") > > it is stated that "syscore_ops operations should be carried with one > > CPU on-line and interrupts disabled." APs are only offlined later in > > machine_shutdown(), so this syscore_shutdown() is being run while APs > > are still online. This seems to be the correct place as it matches where > > syscore_shutdown() is run in the reboot and halt flows - they also run > > it before APs are shut down. The assumption is that the commit message > > in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > > disable_nonboot_cpus()") > > is no longer valid. > > > > KVM has been discussed here as it is what broke loudly by not having > > syscore_shutdown() in kexec, but this change impacts more than just KVM; > > all drivers/modules which register a syscore_ops.shutdown callback will > > now be invoked in the kexec flow. Looking at some of them like x86 MCE > > it is probably more correct to also shut these down during kexec. > > Maintainers of all drivers which use syscore_ops.shutdown are added on > > CC for visibility. They are: > > > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > > arch/x86/kernel/cpu/mce/core.c.shutdown = > > mce_syscore_shutdown, > > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > > drivers/leds/trigger/ledtrig-cpu.c.shutdown = > > ledtrig_cpu_syscore_shutdown, > > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > > > This has been tested by doing a kexec on x86_64 and aarch64. > > From the 10,000 foot perspective: > Acked-by: "Eric W. Biederman" Thanks for the ACK! What's the next step to get this into the kexec tree? JG > > > Eric > > > Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to > > hook restart/shutdown") > > > > Signed-off-by: James Gowans > > Cc: Eric Biederman > > Cc: Paolo Bonzini > > Cc: Sean Christopherson > > Cc: Marc Zyngier > > Cc: Arnd Bergmann > > Cc: Tony Luck > > Cc: Borislav Petkov > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: Chen-Yu Tsai > > Cc: Jernej Skrabec > > Cc: Samuel Holland > > Cc: Pavel Machek > > Cc: Sebastian Reichel > > Cc: Orson Zhai > > Cc: Alexander Graf > > Cc: Jan H. Schoenherr > > --- > > kernel/kexec_core.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > index be5642a4ec49..b926c4db8a91 100644 > > --- a/kernel/kexec_core.c > > +++ b/kernel/kexec_core.c > > @@ -1254,6 +1254,7 @@ int kernel_kexec(void) > > kexec_in_progress = true; > > kernel_restart_prepare("kexec reboot"); > > migrate_to_reboot_cpu(); > > + syscore_shutdown(); > > > > /* > >* migrate_to_reboot_cpu() disables CPU hotplug assuming that ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec: avoid out of bounds in crash_exclude_mem_range()
在 2023/12/14 18:29, Baoquan He 写道: On 11/30/23 at 09:20pm, fuqiang wang wrote: On 2023/11/30 15:44, Baoquan He wrote: On 11/27/23 at 10:56am, fuqiang wang wrote: When the split happened, judge whether mem->nr_ranges is equal to mem->max_nr_ranges. If it is true, return -ENOMEM. The advantage of doing this is that it can avoid array bounds caused by some bugs. E.g., Before commit 4831be702b95 ("arm64/kexec: Fix missing extra range for crashkres_low."), reserve both high and low memories for the crashkernel may cause out of bounds. On the other hand, move this code before the split to ensure that the array will not be changed when return error. If out of array boundary is caused, means the laoding failed, whether the out of boundary happened or not. I don't see how this code change makes sense. Do I miss anything? Thanks Baoquan Hi baoquan, In some configurations, out of bounds may not cause crash_exclude_mem_range() returns error, then the load will succeed. E.g. There is a cmem before execute crash_exclude_mem_range(): cmem = { max_nr_ranges = 3 nr_ranges = 2 ranges = { {start = 1, end = 1000} {start = 1001, end = 2000} } } After executing twice crash_exclude_mem_range() with the start/end params 100/200, 300/400 respectively, the cmem will be: cmem = { max_nr_ranges = 3 nr_ranges = 4 <== nr_ranges > max_nr_ranges ranges = { {start = 1, end = 99 } {start = 201, end = 299 } {start = 401, end = 1000} {start = 1001, end = 2000} <== OUT OF BOUNDS } } Let me borrow your example and copy them here, but I will switch the order of start/end params 100/200, 300/400 executing at below: There is a cmem before execute crash_exclude_mem_range(): cmem = { max_nr_ranges = 3 nr_ranges = 2 ranges = { {start = 1, end = 1000} {start = 1001, end = 2000} } } After executing twice crash_exclude_mem_range() with the start/end params 300/400, the cmem will be: cmem = { max_nr_ranges = 3 nr_ranges = 3 <== nr_ranges == max_nr_ranges ranges = { {start = 1, end = 299 } i=0 {start = 401, end = 1000} i=1 {start = 1001, end = 2000} i=2 } } When it's executing the 100/200 excluding, we have: cmem = { max_nr_ranges = 3 nr_ranges = 4 <== nr_ranges > max_nr_ranges ranges = { {start = 1, end = 99 } i=0 {start = 401, end = 1000} {start = 1001, end = 2000} } } Then splitting happened, i == 0, then for loop is broken and jump out. Then we have the condition checking here: /* Split happened */ if (i == mem->max_nr_ranges - 1) return -ENOMEM; Obviously the conditonal checking is incorrect (given the i == 0 in above case), it should be /* Split happened */ if (mem->nr_ranges == mem->max_nr_ranges) return -ENOMEM; So, now there are two things which need be combed up in crash_exclude_mem_range(): 1) the above conditional check is incorrect, need be fixed; 2) whether we need have the cmem->ranges[] partly changed, or keep it unchanged when OOB happened; Hi baoquan, I'm sorry, I would like to confirm if I misunderstood the meaning of your comment or not. What you mean is that you have agreed to merge the patch, but before that, it needs to be explained in detail in the commit message. Is this understanding correct? And also the incorrect handling in crash_setup_memmap_entries(): 1) the insufficient array slot in crash_setup_memmap_entries(); 2) the uninitialized cmem->max_nr_ranges; If this patch can be merged, the issue of the uninitialized cmem->max_nr_ranges must be resolved before the patch is merged because this patch requires a initialized max_nr_ranges value. I am willing to take on the task of addressing those issues. Thanks fuqiang When an out of bounds occurs during the second execution, the function will not return error. Additionally, when the function returns error, means the load failed. It seems meaningless to keep the original data unchanged. But in my opinion, this will make this function more rigorous and more versatile. (However, I am not sure if it is self-defeating and I hope to receive more suggestions). Thanks fuqiang Signed-off-by: fuqiang wang --- kernel/crash_core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index efe87d501c8c..ffdc246cf425 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -611,6 +611,9 @@ int crash_exclude_mem_range(struct crash_mem *mem, } if (p_start > start && p_end < end) { + /* Split happened */ + if (mem->nr_ranges == mem->max_nr_ranges) + return
[PATCH 2/2] crash_core: fix out-of-bounds access check in crash_exclude_mem_range()
mem->nr_ranges represents the current number of elements stored in the mem->ranges array, and mem->max_nr_ranges represents the maximum number of elements that the mem->ranges array can hold. Therefore, the correct array out-of-bounds check should be mem->nr_ranges >= mem->max_nr_ranges. Signed-off-by: Yuntao Wang --- kernel/crash_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/crash_core.c b/kernel/crash_core.c index d4313b53837e..991494d4cf43 100644 --- a/kernel/crash_core.c +++ b/kernel/crash_core.c @@ -627,7 +627,7 @@ int crash_exclude_mem_range(struct crash_mem *mem, return 0; /* Split happened */ - if (i == mem->max_nr_ranges - 1) + if (mem->nr_ranges >= mem->max_nr_ranges) return -ENOMEM; /* Location where new range should go */ -- 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/2] crash: fix potential cmem->ranges array overflow
This series tries to fix the potential cmem->ranges array overflow. Yuntao Wang (2): x86/crash: fix potential cmem->ranges array overflow crash_core: fix out-of-bounds access check in crash_exclude_mem_range() arch/x86/kernel/crash.c | 9 + kernel/crash_core.c | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) -- 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] x86/crash: fix potential cmem->ranges array overflow
The max_nr_ranges field of cmem allocated in crash_setup_memmap_entries() is not initialized, its default value is 0. When elfcorehdr is allocated from the middle of crashk_res due to any potential reason, that is, `image->elf_load_addr > crashk_res.start && image->elf_load_addr + image->elf_headers_sz - 1 < crashk_res.end`, executing memmap_exclude_ranges() will cause a range split to occur in crash_exclude_mem_range(), which eventually leads to an overflow of the cmem->ranges array. Set cmem->max_nr_ranges to 1 to make crash_exclude_mem_range() return -ENOMEM instead of causing cmem->ranges array overflow even when a split happens. Signed-off-by: Yuntao Wang --- arch/x86/kernel/crash.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index c92d88680dbf..3be46f4b441e 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -282,10 +282,6 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) struct crash_memmap_data cmd; struct crash_mem *cmem; - cmem = vzalloc(struct_size(cmem, ranges, 1)); - if (!cmem) - return -ENOMEM; - memset(, 0, sizeof(struct crash_memmap_data)); cmd.params = params; @@ -321,6 +317,11 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params) } /* Exclude some ranges from crashk_res and add rest to memmap */ + cmem = vzalloc(struct_size(cmem, ranges, 1)); + if (!cmem) + return -ENOMEM; + cmem->max_nr_ranges = 1; + ret = memmap_exclude_ranges(image, cmem, crashk_res.start, crashk_res.end); if (ret) goto out; -- 2.43.0 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec