On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote: > There is a race condition between the kexec_load() system call > (crash kernel loading path) and memory hotplug operations that can > lead to buffer overflow and potential kernel crash. > > During prepare_elf_headers(), the following steps occur: > 1. The first for_each_mem_range() queries current System RAM memory ranges > 2. Allocates buffer based on queried count > 3. The 2st for_each_mem_range() populates ranges from memblock > > If memory hotplug occurs between step 1 and step 3, the number of ranges > can increase, causing out-of-bounds write when populating cmem->ranges[]. > > This happens because kexec_load() uses kexec_trylock (atomic_t) while > memory hotplug uses device_hotplug_lock (mutex), so they don't serialize > with each other. > > Add the explicit bounds checking to prevent out-of-bounds access.
It seems you have a TOCTOU type of issue, and this seems to be shrinking the window, but not fully solving it? > Cc: Catalin Marinas <[email protected]> > Cc: Will Deacon <[email protected]> > Cc: Andrew Morton <[email protected]> > Cc: Baoquan He <[email protected]> > Cc: Breno Leitao <[email protected]> > Cc: [email protected] > Fixes: 3751e728cef2 ("arm64: kexec_file: add crash dump support") > Closes: > https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie%40huawei.com > Signed-off-by: Jinjie Ruan <[email protected]> > --- > arch/arm64/kernel/machine_kexec_file.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/arm64/kernel/machine_kexec_file.c > b/arch/arm64/kernel/machine_kexec_file.c > index e31fabed378a..a67e7b1abbab 100644 > --- a/arch/arm64/kernel/machine_kexec_file.c > +++ b/arch/arm64/kernel/machine_kexec_file.c > @@ -59,6 +59,11 @@ static int prepare_elf_headers(void **addr, unsigned long > *sz) > cmem->max_nr_ranges = nr_ranges; > cmem->nr_ranges = 0; > for_each_mem_range(i, &start, &end) { > + if (cmem->nr_ranges >= cmem->max_nr_ranges) { > + ret = -ENOMEM; -ENOMEM seems to be the the wrong errno. This isn't an allocation failure; it's a transient race. -EBUSY or -EAGAIN would be more honest
