On Mon, Nov 24 2025, H. Peter Anvin wrote:

> On November 24, 2025 11:24:58 AM PST, Usama Arif <[email protected]> 
> wrote:
>>
>>
>>On 09/05/2025 08:46, Changyuan Lyu wrote:
>>> From: Alexander Graf <[email protected]>
>>> 
>>> KHO kernels are special and use only scratch memory for memblock
>>> allocations, but memory below 1M is ignored by kernel after early boot
>>> and cannot be naturally marked as scratch.
>>> 
>>> To allow allocation of the real-mode trampoline and a few (if any) other
>>> very early allocations from below 1M forcibly mark the memory below 1M
>>> as scratch.
>>> 
>>> After real mode trampoline is allocated, clear that scratch marking.
>>> 
>>> Signed-off-by: Alexander Graf <[email protected]>
>>> Co-developed-by: Mike Rapoport (Microsoft) <[email protected]>
>>> Signed-off-by: Mike Rapoport (Microsoft) <[email protected]>
>>> Co-developed-by: Changyuan Lyu <[email protected]>
>>> Signed-off-by: Changyuan Lyu <[email protected]>
>>> Acked-by: Dave Hansen <[email protected]>
>>> ---
>>>  arch/x86/kernel/e820.c   | 18 ++++++++++++++++++
>>>  arch/x86/realmode/init.c |  2 ++
>>>  2 files changed, 20 insertions(+)
>>> 
>>> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>> index 9920122018a0b..c3acbd26408ba 100644
>>> --- a/arch/x86/kernel/e820.c
>>> +++ b/arch/x86/kernel/e820.c
>>> @@ -1299,6 +1299,24 @@ void __init e820__memblock_setup(void)
>>>             memblock_add(entry->addr, entry->size);
>>>     }
>>>  
>>> +   /*
>>> +    * At this point memblock is only allowed to allocate from memory
>>> +    * below 1M (aka ISA_END_ADDRESS) up until direct map is completely set
>>> +    * up in init_mem_mapping().
>>> +    *
>>> +    * KHO kernels are special and use only scratch memory for memblock
>>> +    * allocations, but memory below 1M is ignored by kernel after early
>>> +    * boot and cannot be naturally marked as scratch.
>>> +    *
>>> +    * To allow allocation of the real-mode trampoline and a few (if any)
>>> +    * other very early allocations from below 1M forcibly mark the memory
>>> +    * below 1M as scratch.
>>> +    *
>>> +    * After real mode trampoline is allocated, we clear that scratch
>>> +    * marking.
>>> +    */
>>> +   memblock_mark_kho_scratch(0, SZ_1M);
>>> +
>>>     /*
>>>      * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
>>>      * to even less without it.
>>> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>>> index f9bc444a3064d..9b9f4534086d2 100644
>>> --- a/arch/x86/realmode/init.c
>>> +++ b/arch/x86/realmode/init.c
>>> @@ -65,6 +65,8 @@ void __init reserve_real_mode(void)
>>>      * setup_arch().
>>>      */
>>>     memblock_reserve(0, SZ_1M);
>>> +
>>> +   memblock_clear_kho_scratch(0, SZ_1M);
>>>  }
>>>  
>>>  static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
>>
>>Hello!
>>
>>I am working with Breno who reported that we are seeing the below warning at 
>>boot
>>when rolling out 6.16 in Meta fleet. It is difficult to reproduce on a single 
>>host
>>manually but we are seeing this several times a day inside the fleet.
>>
>> 20:16:33  ------------[ cut here ]------------
>> 20:16:33  WARNING: CPU: 0 PID: 0 at mm/memblock.c:668 
>> memblock_add_range+0x316/0x330
>> 20:16:33  Modules linked in:
>> 20:16:33  CPU: 0 UID: 0 PID: 0 Comm: swapper Tainted: G S                  
>> 6.16.1-0_fbk0_0_gc0739ee5037a #1 NONE 
>> 20:16:33  Tainted: [S]=CPU_OUT_OF_SPEC
>> 20:16:33  RIP: 0010:memblock_add_range+0x316/0x330
>> 20:16:33  Code: ff ff ff 89 5c 24 08 41 ff c5 44 89 6c 24 10 48 63 74 24 08 
>> 48 63 54 24 10 e8 26 0c 00 00 e9 41 ff ff ff 0f 0b e9 af fd ff ff <0f> 0b e9 
>> b7 fd ff ff 0f 0b 0f 0b cc cc cc cc cc cc cc cc cc cc cc
>> 20:16:33  RSP: 0000:ffffffff83403dd8 EFLAGS: 00010083 ORIG_RAX: 
>> 0000000000000000
>> 20:16:33  RAX: ffffffff8476ff90 RBX: 0000000000001c00 RCX: 0000000000000002
>> 20:16:33  RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffffffff83bad4d8
>> 20:16:33  RBP: 000000000009f000 R08: 0000000000000020 R09: 8000000000097101
>> 20:16:33  R10: ffffffffff2004b0 R11: 203a6d6f646e6172 R12: 000000000009ec00
>> 20:16:33  R13: 0000000000000002 R14: 0000000000100000 R15: 000000000009d000
>> 20:16:33  FS:  0000000000000000(0000) GS:0000000000000000(0000) 
>> knlGS:0000000000000000
>> 20:16:33  CR2: ffff888065413ff8 CR3: 00000000663b7000 CR4: 00000000000000b0
>> 20:16:33  Call Trace:
>> 20:16:33   <TASK>
>> 20:16:33   ? __memblock_reserve+0x75/0x80
>> 20:16:33   ? setup_arch+0x30f/0xb10
>> 20:16:33   ? start_kernel+0x58/0x960
>> 20:16:33   ? x86_64_start_reservations+0x20/0x20
>> 20:16:33   ? x86_64_start_kernel+0x13d/0x140
>> 20:16:33   ? common_startup_64+0x13e/0x140
>> 20:16:33   </TASK>
>> 20:16:33  ---[ end trace 0000000000000000 ]--- 
>>
>>
>>Rolling out with memblock=debug is not really an option in a large scale 
>>fleet due to the
>>time added to boot. But I did try on one of the hosts (without reproducing 
>>the issue) and I see:
>>
>>[    0.000616]  memory.cnt  = 0x6
>>[    0.000617]  memory[0x0]   [0x0000000000001000-0x000000000009bfff], 
>>0x000000000009b000 bytes flags: 0x40
>>[    0.000620]  memory[0x1]   [0x000000000009f000-0x000000000009ffff], 
>>0x0000000000001000 bytes flags: 0x40
>>[    0.000621]  memory[0x2]   [0x0000000000100000-0x000000005ed09fff], 
>>0x000000005ec0a000 bytes flags: 0x0
>>...
>>
>>The 0x40 (MEMBLOCK_KHO_SCRATCH) is coming from memblock_mark_kho_scratch in 
>>e820__memblock_setup. I believe this
>>should be under ifdef like the diff at the end? (Happy to send this as a 
>>patch for review if it makes sense).
>>We have KEXEC_HANDOVER disabled in our defconfig, therefore 
>>MEMBLOCK_KHO_SCRATCH shouldnt be selected and
>>we shouldnt have any MEMBLOCK_KHO_SCRATCH type regions in our memblock 
>>reservations.
>>
>>The other thing I did was insert a while(1) just before the warning and 
>>inspected the registers in qemu.
>>R14 held the base register, and R15 held the size at that point.
>>In the warning R14 is 0x100000 meaning that someone is reserving a region 
>>with a different flag to MEMBLOCK_NONE
>>at the boundary of MEMBLOCK_KHO_SCRATCH.
>>
>>diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
>>index c3acbd26408ba..26e4062a0bd09 100644
>>--- a/arch/x86/kernel/e820.c
>>+++ b/arch/x86/kernel/e820.c
>>@@ -1299,6 +1299,7 @@ void __init e820__memblock_setup(void)
>>                memblock_add(entry->addr, entry->size);
>>        }
>> 
>>+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>>        /*
>>         * At this point memblock is only allowed to allocate from memory
>>         * below 1M (aka ISA_END_ADDRESS) up until direct map is completely 
>> set
>>@@ -1316,7 +1317,7 @@ void __init e820__memblock_setup(void)
>>         * marking.
>>         */
>>        memblock_mark_kho_scratch(0, SZ_1M);
>>-
>>+#endif
>>        /*
>>         * 32-bit systems are limited to 4BG of memory even with HIGHMEM and
>>         * to even less without it.
>>diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
>>index 88be32026768c..1cd80293a3e23 100644
>>--- a/arch/x86/realmode/init.c
>>+++ b/arch/x86/realmode/init.c
>>@@ -66,8 +66,9 @@ void __init reserve_real_mode(void)
>>         * setup_arch().
>>         */
>>        memblock_reserve(0, SZ_1M);
>>-
>>+#ifdef CONFIG_MEMBLOCK_KHO_SCRATCH
>>        memblock_clear_kho_scratch(0, SZ_1M);
>>+#endif
>> }
>> 
>> static void __init sme_sev_setup_real_mode(struct trampoline_header *th)
>
> What does "scratch" mean in this exact context? (Sorry, don't have the code 
> in front of me.)

See https://docs.kernel.org/core-api/kho/concepts.html#scratch-regions

-- 
Regards,
Pratyush Yadav

Reply via email to