On 5/11/2026 5:46 PM, Breno Leitao wrote:
> On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote:
>> There is a race condition between the kexec_load() system call
>> (crash kernel loading path) and memory hotplug operations that can
>> lead to buffer overflow and potential kernel crash.
>>
>> During prepare_elf_headers(), the following steps occur:
>> 1. The first for_each_mem_range() queries current System RAM memory ranges
>> 2. Allocates buffer based on queried count
>> 3. The 2st for_each_mem_range() populates ranges from memblock
>>
>> If memory hotplug occurs between step 1 and step 3, the number of ranges
>> can increase, causing out-of-bounds write when populating cmem->ranges[].
>>
>> This happens because kexec_load() uses kexec_trylock (atomic_t) while
>> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize
>> with each other.
>>
>> Add the explicit bounds checking to prevent out-of-bounds access.
> 
> It seems you have a TOCTOU type of issue, and this seems to be shrinking
> the window, but not fully solving it?

I plan to fix this issue as follows, and would appreciate your feedback
on whether this is reasonable.

Sashiko AI code review pointed out there is a TOCTOU (Time-of-Check to
Time-of-Use) race condition in prepare_elf_headers() between the initial
pass that counts System RAM ranges and the second pass that populates them.
If a memory hotplug event occurs between these two steps, the number of
memory regions may increase, causing an out-of-bounds write to
the cmem->ranges[] array.

To resolve this and ensure data consistency, this patch:

1. Wraps the counting and population passes with get_online_mems() and
   crash_hotplug_lock(). This serializes the kexec_file_load() path
   with concurrent memory hotplug operations, ensuring the memory
   map remains consistent throughout the header preparation.

2. Adds an explicit boundary check in prepare_elf64_ram_headers_callback().
   If the number of ranges exceeds the allocated maximum, it now returns
   -EAGAIN, which indicates a transient race, signaling userspace
   kexec-tools to retry the syscall instead of leaving the system
without a loaded crash kernel.

index daf81a873bbd..546be6261177 100644
--- a/arch/arm64/kernel/machine_kexec_file.c
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -15,6 +15,7 @@
 #include <linux/kexec.h>
 #include <linux/libfdt.h>
 #include <linux/memblock.h>
+#include <linux/memory_hotplug.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
 #include <linux/slab.h>
@@ -40,7 +41,7 @@ int arch_kimage_file_post_load_cleanup(struct kimage
*image)
 }

 #ifdef CONFIG_CRASH_DUMP
-int prepare_elf_headers(void **addr, unsigned long *sz)
+static int __prepare_elf_headers(void **addr, unsigned long *sz)
 {
        struct crash_mem *cmem;
        unsigned int nr_ranges;
@@ -59,6 +60,11 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
        cmem->max_nr_ranges = nr_ranges;
        cmem->nr_ranges = 0;
        for_each_mem_range(i, &start, &end) {
+               if (cmem->nr_ranges >= cmem->max_nr_ranges) {
+                       ret = -EAGAIN;
+                       goto out;
+               }
+
                cmem->ranges[cmem->nr_ranges].start = start;
                cmem->ranges[cmem->nr_ranges].end = end - 1;
                cmem->nr_ranges++;
@@ -81,6 +87,21 @@ int prepare_elf_headers(void **addr, unsigned long *sz)
        kfree(cmem);
        return ret;
 }
+
+int prepare_elf_headers(void **addr, unsigned long *sz)
+{
+       int ret;
+
+       crash_hotplug_lock();
+       get_online_mems();
+
+       ret = __prepare_elf_headers(addr, sz);
+
+       put_online_mems();
+       crash_hotplug_unlock();
+
+       return ret;
+}
 #endif

> 
>> Cc: Catalin Marinas <[email protected]>
>> Cc: Will Deacon <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Baoquan He <[email protected]>
>> Cc: Breno Leitao <[email protected]>
>> Cc: [email protected]
>> Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support")
>> Closes: 
>> https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com
>> Signed-off-by: Jinjie Ruan <[email protected]>
>> ---
>>  arch/arm64/kernel/machine_kexec_file.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/machine_kexec_file.c 
>> b/arch/arm64/kernel/machine_kexec_file.c
>> index e31fabed378a..a67e7b1abbab 100644
>> --- a/arch/arm64/kernel/machine_kexec_file.c
>> +++ b/arch/arm64/kernel/machine_kexec_file.c
>> @@ -59,6 +59,11 @@ static int prepare_elf_headers(void **addr, unsigned long 
>> *sz)
>>      cmem->max_nr_ranges = nr_ranges;
>>      cmem->nr_ranges = 0;
>>      for_each_mem_range(i, &start, &end) {
>> +            if (cmem->nr_ranges >= cmem->max_nr_ranges) {
>> +                    ret = -ENOMEM;
> 
> -ENOMEM seems to be the the wrong errno. This isn't an allocation
> failure; it's a transient race. -EBUSY or -EAGAIN would be more honest


Reply via email to