On 03/11/25 15:40, Ritesh Harjani (IBM) wrote:
Sourabh Jain <[email protected]> writes:Commit 35c18f2933c5 ("Add a new optional ",cma" suffix to the crashkernel= command line option") and commit ab475510e042 ("kdump: implement reserve_crashkernel_cma") added CMA support for kdump crashkernel reservation. Extend crashkernel CMA reservation support to powerpc. The following changes are made to enable CMA reservation on powerpc: - Parse and obtain the CMA reservation size along with other crashkernel parameters - Call reserve_crashkernel_cma() to allocate the CMA region for kdump - Include the CMA-reserved ranges in the usable memory ranges for the kdump kernel to use. - Exclude the CMA-reserved ranges from the crash kernel memory to prevent them from being exported through /proc/vmcore. With the introduction of the CMA crashkernel regions, crash_exclude_mem_range() needs to be called multiple times to exclude both crashk_res and crashk_cma_ranges from the crash memory ranges. To avoid repetitive logic for validating mem_ranges size and handling reallocation when required, this functionality is moved to a new wrapper function crash_exclude_mem_range_guarded(). To ensure proper CMA reservation, reserve_crashkernel_cma() is called after pageblock_order is initialized. Update kernel-parameters.txt to document CMA support for crashkernel on powerpc architecture. Cc: Baoquan he <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mahesh Salgaonkar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ritesh Harjani (IBM) <[email protected]> Cc: Shivang Upadhyay <[email protected]> Cc: [email protected] Signed-off-by: Sourabh Jain <[email protected]> --- Changlog: v3 -> v4 - Removed repeated initialization to tmem in crash_exclude_mem_range_guarded() - Call crash_exclude_mem_range() with right crashk ranges v4 -> v5: - Document CMA-based crashkernel support for ppc64 in kernel-parameters.txt --- .../admin-guide/kernel-parameters.txt | 2 +- arch/powerpc/include/asm/kexec.h | 2 + arch/powerpc/kernel/setup-common.c | 4 +- arch/powerpc/kexec/core.c | 10 ++++- arch/powerpc/kexec/ranges.c | 43 ++++++++++++++----- 5 files changed, 47 insertions(+), 14 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6c42061ca20e..0f386b546cec 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1013,7 +1013,7 @@ It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G. crashkernel=size[KMG],cma - [KNL, X86] Reserve additional crash kernel memory from + [KNL, X86, ppc64] Reserve additional crash kernel memory fromShouldn't this be PPC and not ppc64? If I see the crash_dump support... config ARCH_SUPPORTS_CRASH_DUMP def_bool PPC64 || PPC_BOOK3S_32 || PPC_85xx || (44x && !SMP) The changes below aren't specific to ppc64 correct?
The thing is this feature is only supported with KEXEC_FILE and which only supported on PPC64:
config ARCH_SUPPORTS_KEXEC_FILE def_bool PPC64 Hence I kept it as ppc64. I think I should update that in the commit message. Also do you think is it good to restrict this feature to KEXEC_FILE?
CMA. This reservation is usable by the first system's userspace memory and kernel movable allocations (memory balloon, zswap). Pages allocated from this memory range diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h index 4bbf9f699aaa..bd4a6c42a5f3 100644 --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -115,9 +115,11 @@ int setup_new_fdt_ppc64(const struct kimage *image, void *fdt, struct crash_mem #ifdef CONFIG_CRASH_RESERVE int __init overlaps_crashkernel(unsigned long start, unsigned long size); extern void arch_reserve_crashkernel(void); +extern void kdump_cma_reserve(void); #else static inline void arch_reserve_crashkernel(void) {} static inline int overlaps_crashkernel(unsigned long start, unsigned long size) { return 0; } +static inline void kdump_cma_reserve(void) { } #endif#if defined(CONFIG_CRASH_DUMP)diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 68d47c53876c..c8c42b419742 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -35,6 +35,7 @@ #include <linux/of_irq.h> #include <linux/hugetlb.h> #include <linux/pgtable.h> +#include <asm/kexec.h> #include <asm/io.h> #include <asm/paca.h> #include <asm/processor.h> @@ -995,11 +996,12 @@ void __init setup_arch(char **cmdline_p) initmem_init();/*- * Reserve large chunks of memory for use by CMA for fadump, KVM and + * Reserve large chunks of memory for use by CMA for kdump, fadump, KVM and * hugetlb. These must be called after initmem_init(), so that * pageblock_order is initialised. */ fadump_cma_init(); + kdump_cma_reserve(); kvm_cma_reserve(); gigantic_hugetlb_cma_reserve();diff --git a/arch/powerpc/kexec/core.c b/arch/powerpc/kexec/core.cindex d1a2d755381c..25744737eff5 100644 --- a/arch/powerpc/kexec/core.c +++ b/arch/powerpc/kexec/core.c @@ -33,6 +33,8 @@ void machine_kexec_cleanup(struct kimage *image) { }+unsigned long long cma_size;+nit: Since this is a gloabal powerpc variable you are defining, then can we keep it's name to crashk_cma_size?
Yeah make sense. I will update the variable name.
/* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. @@ -110,7 +112,7 @@ void __init arch_reserve_crashkernel(void)/* use common parsing */ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, - &crash_base, NULL, NULL, NULL); + &crash_base, NULL, &cma_size, NULL);if (ret)return; @@ -130,6 +132,12 @@ void __init arch_reserve_crashkernel(void) reserve_crashkernel_generic(crash_size, crash_base, 0, false); }+void __init kdump_cma_reserve(void)+{ + if (cma_size) + reserve_crashkernel_cma(cma_size); +} +nit: cma_size is already checked for null within reserve_crashkernel_cma(), so we don't really need kdump_cma_reserve() function call as such. Also kdump_cma_reserve() only make sense with #ifdef CRASHKERNEL_CMA.. so instead do you think we can directly call reserve_crashkernel_cma(cma_size)?
I think the above kdump_cma_reserve() definition should come under CONFIG_CRASH_RESERVE
because the way it is declared in arch/powerpc/include/asm/kexec.h. I would like to keep kdump_cma_reserve() as is it because of two reasons: - It keeps setup_arch() free from kdump #ifdefs- In case if we want to add some condition on this reservation it would straight forward.
So lets keep kdump_cma_reserve as is, unless you have strong opinion on not to.
int __init overlaps_crashkernel(unsigned long start, unsigned long size) { return (start + size) > crashk_res.start && start <= crashk_res.end; diff --git a/arch/powerpc/kexec/ranges.c b/arch/powerpc/kexec/ranges.c index 3702b0bdab14..3bd27c38726b 100644 --- a/arch/powerpc/kexec/ranges.c +++ b/arch/powerpc/kexec/ranges.c @@ -515,7 +515,7 @@ int get_exclude_memory_ranges(struct crash_mem **mem_ranges) */ int get_usable_memory_ranges(struct crash_mem **mem_ranges) { - int ret; + int ret, i;/** Early boot failure observed on guests when low memory (first memory @@ -528,6 +528,13 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) if (ret) goto out;+ for (i = 0; i < crashk_cma_cnt; i++) {+ ret = add_mem_range(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end - crashk_cma_ranges[i].start + 1); + if (ret) + goto out; + } + ret = add_rtas_mem_range(mem_ranges); if (ret) goto out; @@ -546,6 +553,22 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) #endif /* CONFIG_KEXEC_FILE */#ifdef CONFIG_CRASH_DUMP+static int crash_exclude_mem_range_guarded(struct crash_mem **mem_ranges, + unsigned long long mstart, + unsigned long long mend) +{ + struct crash_mem *tmem = *mem_ranges; + + /* Reallocate memory ranges if there is no space to split ranges */ + if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { + tmem = realloc_mem_ranges(mem_ranges); + if (!tmem) + return -ENOMEM; + } + + return crash_exclude_mem_range(tmem, mstart, mend); +} + /** * get_crash_memory_ranges - Get crash memory ranges. This list includes * first/crashing kernel's memory regions that @@ -557,7 +580,6 @@ int get_usable_memory_ranges(struct crash_mem **mem_ranges) int get_crash_memory_ranges(struct crash_mem **mem_ranges) { phys_addr_t base, end; - struct crash_mem *tmem; u64 i; int ret;@@ -582,19 +604,18 @@ int get_crash_memory_ranges(struct crash_mem **mem_ranges)sort_memory_ranges(*mem_ranges, true); }- /* Reallocate memory ranges if there is no space to split ranges */- tmem = *mem_ranges; - if (tmem && (tmem->nr_ranges == tmem->max_nr_ranges)) { - tmem = realloc_mem_ranges(mem_ranges); - if (!tmem) - goto out; - } - /* Exclude crashkernel region */ - ret = crash_exclude_mem_range(tmem, crashk_res.start, crashk_res.end); + ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_res.start, crashk_res.end); if (ret) goto out;+ for (i = 0; i < crashk_cma_cnt; ++i) {+ ret = crash_exclude_mem_range_guarded(mem_ranges, crashk_cma_ranges[i].start, + crashk_cma_ranges[i].end); + if (ret) + goto out; + } + /* * FIXME: For now, stay in parity with kexec-tools but if RTAS/OPAL * regions are exported to save their context at the time of -- 2.51.0
