On Mon, May 11, 2026 at 07:30:44PM +0800, Jinjie Ruan wrote: > > > On 5/11/2026 5:46 PM, Breno Leitao wrote: > > On Mon, May 11, 2026 at 11:04:43AM +0800, Jinjie Ruan wrote: > >> There is a race condition between the kexec_load() system call > >> (crash kernel loading path) and memory hotplug operations that can > >> lead to buffer overflow and potential kernel crash. > >> > >> During prepare_elf_headers(), the following steps occur: > >> 1. The first for_each_mem_range() queries current System RAM memory ranges > >> 2. Allocates buffer based on queried count > >> 3. The 2st for_each_mem_range() populates ranges from memblock > >> > >> If memory hotplug occurs between step 1 and step 3, the number of ranges > >> can increase, causing out-of-bounds write when populating cmem->ranges[]. > >> > >> This happens because kexec_load() uses kexec_trylock (atomic_t) while > >> memory hotplug uses device_hotplug_lock (mutex), so they don't serialize > >> with each other. > >> > >> Add the explicit bounds checking to prevent out-of-bounds access. > > > > It seems you have a TOCTOU type of issue, and this seems to be shrinking > > the window, but not fully solving it? > > Hi Breno, > > Thanks for your comments regarding the TOCTOU issue. > > You are correct that the current bounds checking only "shrinks the > window" and prevents a kernel crash, but doesn't fully guarantee header > consistency if a race occurs. > > In my local environment, this race is extremely difficult to reproduce, > but it is theoretically possible. > > To address this properly for arm64, I am considering two steps: > > - For this patch: I will change the return value to -EAGAIN and keep the > bounds check. This ensures that even if a race happens, the kernel > remains safe (no OOB access), and user-space is notified to retry. > > - Long-term solution: A better way to solve this is to implement ARM64 > CRASH_HOTPLUG support (similar to x86). With crash hotplug, the kernel > will automatically re-generate the crash headers whenever a memory > hotplug event occurs. This makes the TOCTOU during the initial > kexec_load less critical, as any transient inconsistency will be > immediately corrected by the subsequent hotplug handler. > > Does it make sense to you to use this patch as a safety guard first, and > then I (or someone else) follow up with the full CRASH_HOTPLUG support > for arm64 as [1]?
It would be OK for me, but, make it explict that there is a TOCTOU issue, that depends on CRASH_HOTPLUG.
