On 04/11/25 10:48, Sourabh Jain wrote:
On 03/11/25 15:40, Ritesh Harjani (IBM) wrote:Sourabh Jain <[email protected]> writes:Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab475510e042 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Extend crashkernel CMA reservation support to powerpc. The following changes are made to enable CMA reservation on powerpc:- Parse and obtain the CMA reservation size along with other crashkernelparameters - Call reserve_crashkernel_cma() to allocate the CMA region for kdump - Include the CMA-reserved ranges in the usable memory ranges for the kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore. With the introduction of the CMA crashkernel regions, crash_exclude_mem_range() needs to be called multiple times to exclude both crashk_res and crashk_cma_ranges from the crash memory ranges. To avoid repetitive logic for validating mem_ranges size and handlingreallocation when required, this functionality is moved to a new wrapperfunction crash_exclude_mem_range_guarded(). To ensure proper CMA reservation, reserve_crashkernel_cma() is called after pageblock_order is initialized. Update kernel-parameters.txt to document CMA support for crashkernel on powerpc architecture. Cc: Baoquan he <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ritesh Harjani (IBM) <[email protected]> Cc: Shivang Upadhyay <[email protected]> Cc: [email protected] Signed-off-by: Sourabh Jain <[email protected]> --- Changlog: v3 -> v4 - Removed repeated initialization to tmem in crash_exclude_mem_range_guarded() - Call crash_exclude_mem_range() with right crashk ranges v4 -> v5:- Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt--- .../admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/kexec.h | 2 + arch/powerpc/kernel/setup-common.c | 4 +- arch/powerpc/kexec/core.c | 10 ++++-arch/powerpc/kexec/ranges.c | 43 ++++++++++++++-----5 files changed, 47 insertions(+), 14 deletions(-)diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txtindex 6c42061ca20e..0f386b546cec 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1013,7 +1013,7 @@ It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. crashkernel=size[KMG],cma - [KNL, X86] Reserve additional crash kernel memory from+ [KNL, X86, ppc64] Reserve additional crash kernel memory fromShouldn't this be PPC and not ppc64? If I see the crash_dump support... config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) The changes below aren't specific to ppc64 correct?The thing is this feature is only supported with KEXEC_FILE and which only supported on PPC64:config ARCH_SUPPORTS_KEXEC_FILE def_bool PPC64 Hence I kept it as ppc64. I think I should update that in the commit message. Also do you think is it good to restrict this feature to KEXEC_FILE?
Putting this under KEXEC_FILE may not help much because KEXEC_FILE is enabled by default in most configurations. Once it is enabled, the CMA reservation will
happen regardless of which system call is used to load the kdump kernel (kexec_load or kexec_file_load).However, not restricting this feature to KEXEC_FILE will allow the kexec tool to
independently add support for this feature in the future for the kexec_load system call.With that logic, I think if we do not restrict this feature to KEXEC_FILE, the support
will be available for ppc and not limited to ppc64.
CMA. This reservation is usable by the first system's userspace memory and kernel movable allocations (memory balloon, zswap). Pages allocated from this memory rangediff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.hindex 4bbf9f699aaa..bd4a6c42a5f3 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h@@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem#ifdef CONFIG_CRASH_RESERVEint __init overlaps_crashkernel(unsigned long start, unsigned long size);extern void arch_reserve_crashkernel(void); +extern void kdump_cma_reserve(void); #else static inline void arch_reserve_crashkernel(void) {}static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; }+static inline void kdump_cma_reserve(void) { } #endif #if defined(CONFIG_CRASH_DUMP)diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.cindex 68d47c53876c..c8c42b419742 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -35,6 +35,7 @@ #include <linux/of_irq.h> #include <linux/hugetlb.h> #include <linux/pgtable.h> +#include <asm/kexec.h> #include <asm/io.h> #include <asm/paca.h> #include <asm/processor.h> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) initmem_init(); /*- * Reserve large chunks of memory for use by CMA for fadump, KVM and + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and* hugetlb. These must be called after initmem_init(), so that * pageblock_order is initialised. */ fadump_cma_init(); + kdump_cma_reserve(); kvm_cma_reserve(); gigantic_hugetlb_cma_reserve(); diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.c index d1a2d755381c..25744737eff5 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) { } +unsigned long long cma_size; +nit: Since this is a gloabal powerpc variable you are defining, then can we keep it's name to crashk_cma_size?Yeah make sense. I will update the variable name./* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void) /* use common parsing */ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size,- &crash_base, NULL, NULL, NULL); + &crash_base, NULL, &cma_size, NULL); if (ret) return; @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) reserve_crashkernel_generic(crash_size, crash_base, 0, false); } +void __init kdump_cma_reserve(void) +{ + if (cma_size) + reserve_crashkernel_cma(cma_size); +} +nit: cma_size is already checked for null within reserve_crashkernel_cma(), so we don't really need kdump_cma_reserve() function call as such. Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA..so instead do you think we can directly call reserve_crashkernel_cma(cma_size)?I think the above kdump_cma_reserve() definition should come under CONFIG_CRASH_RESERVEbecause the way it is declared in arch/powerpc/include/asm/kexec.h. I would like to keep kdump_cma_reserve() as is it because of two reasons: - It keeps setup_arch() free from kdump #ifdefs- In case if we want to add some condition on this reservation it would straight forward.So lets keep kdump_cma_reserve as is, unless you have strong opinion on not to.int __init overlaps_crashkernel(unsigned long start, unsigned long size){return (start + size) > crashk_res.start && start <= crashk_res.end;diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c index 3702b0bdab14..3bd27c38726b 100644 --- a/arch/powerpc/kexec/ranges.c +++ b/arch/powerpc/kexec/ranges.c@@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges)*/ int get_usable_memory_ranges(struct crash_mem **mem_ranges) { - int ret; + int ret, i; /** Early boot failure observed on guests when low memory (first memory @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges)if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; i++) { + ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start,+ crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1);+ if (ret) + goto out; + } + ret = add_rtas_mem_range(mem_ranges); if (ret) goto out;@@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges)#endif /* CONFIG_KEXEC_FILE */ #ifdef CONFIG_CRASH_DUMP+static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges,+ unsigned long long mstart, + unsigned long long mend) +{ + struct crash_mem *tmem = *mem_ranges; ++ /* Reallocate memory ranges if there is no space to split ranges */+ if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + return -ENOMEM; + } + + return crash_exclude_mem_range(tmem, mstart, mend); +} + /*** get_crash_memory_ranges - Get crash memory ranges. This list includes * first/crashing kernel's memory regions that @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges)int get_crash_memory_ranges(struct crash_mem **mem_ranges) { phys_addr_t base, end; - struct crash_mem *tmem; u64 i; int ret;@@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges)sort_memory_ranges(*mem_ranges, true); }- /* Reallocate memory ranges if there is no space to split ranges */- tmem = *mem_ranges; - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { - tmem = realloc_mem_ranges(mem_ranges); - if (!tmem) - goto out; - } - /* Exclude crashkernel region */- ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end);if (ret) goto out; + for (i = 0; i < crashk_cma_cnt; ++i) {+ ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start,+ crashk_cma_ranges[i].end); + if (ret) + goto out; + } + /** FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL * regions are exported to save their context at the time of-- 2.51.0
