> On May 14, 2026, at 15:51, Oscar Salvador <[email protected]> wrote:
> 
> On Wed, May 13, 2026 at 09:04:29PM +0800, Muchun Song wrote:
>> Commit 622026e87c40 ("mm/hugetlb: remove fake head pages") switched
>> HVO to reuse per-zone shared tail pages from zone->vmemmap_tails[].
>> 
>> Those shared tail pages were initialized in hugetlb_vmemmap_init(), but
>> bootmem HugeTLB folios are prepared earlier from gather_bootmem_prealloc().
>> With hugetlb_free_vmemmap=on, prep_and_add_bootmem_folios() can access
>> pageblock flags on bootmem HugeTLB pages whose mirrored tail struct pages
>> already point to the shared tail page. On CONFIG_DEBUG_VM kernels,
>> get_pfnblock_bitmap_bitidx() then dereferences the still-uninitialized
>> shared tail page and can panic during boot.
>> 
>> Initialize zone->vmemmap_tails[] from gather_bootmem_prealloc(), before
>> bootmem HugeTLB folios are processed, and drop the later initialization
>> from hugetlb_vmemmap_init().
>> 
>> This bug only affects CONFIG_DEBUG_VM kernels, where the relevant
>> assertion is evaluated.
>> 
>> Fixes: 622026e87c40 ("mm/hugetlb: remove fake head pages")
>> Signed-off-by: Muchun Song <[email protected]>
> 
> For the correctness of the change
> 
> Acked-by: Oscar Salvador <[email protected]>

Thanks.

> 
> but I have a couple of comments:
> 
>> ---
>> mm/hugetlb.c         | 19 +++++++++++++++++++
>> mm/hugetlb_vmemmap.c | 17 -----------------
>> 2 files changed, 19 insertions(+), 17 deletions(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 31b34ca0f402..d22683ab30a1 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3382,6 +3382,25 @@ static void __init gather_bootmem_prealloc(void)
>> .max_threads = num_node_state(N_MEMORY),
>> .numa_aware = true,
>> };
>> +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>> +    struct zone *zone;
>> +
>> +    for_each_zone(zone) {
>> +            for (int i = 0; i < NR_VMEMMAP_TAILS; i++) {
>> +                    struct page *tail, *p;
>> +                    unsigned int order;
>> +
>> +                    tail = zone->vmemmap_tails[i];
>> +                    if (!tail)
>> +                            continue;
>> +
>> +                    order = i + VMEMMAP_TAIL_MIN_ORDER;
>> +                    p = page_to_virt(tail);
>> +                    for (int j = 0; j < PAGE_SIZE / sizeof(struct page); 
>> j++)
>> +                            init_compound_tail(p + j, NULL, order, zone);
>> +            }
>> +    }
> 
> This deserves a comment why do we need to initialize those pages here, no 
> need for
> a fat one but a hint, because everybody else looking at this will wonder why 
> do not
> have it in hugetlb_vmemmap_init(), as the name suggests.

Make sense.

> 
> 
>> +#endif
>> 
>>      padata_do_multithreaded(&job);
>> }
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index 4a077d231d3a..62e61af18c9a 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -870,27 +870,10 @@ static const struct ctl_table 
>> hugetlb_vmemmap_sysctls[] = {
>> static int __init hugetlb_vmemmap_init(void)
> 
> At this point, hugetlb_vmemmap_init only initialites the sysctls for
> each hstate, should the name reflect that better?

A good point. I can add a new cleanup patch.

> 
> Also, vmemmap_get_tail() still thinks that tail pages are initialized
> here from this comment:

Good catch. I'll fix it.

> 
> "/*
>   * Only allocate the page, but do not initialize it.
>   *
>   * Any initialization done here will be overwritten by memmap_init().
>   *
>   * hugetlb_vmemmap_init() will take care of initialization after
>   * memmap_init().
>   */"
> 
> so we might want to adjust that as well, and I am not sure if we have
> more comments in the tree referencing hugetlb_vmemmap_init() as the init
> point for those pages that need correction.

Only this one.

> 
> 
> Having said that, and this is just a question, we cannot really make
> gather_bootmem_prealloc() also do the initialization of the sysfs right?
> I see that hugetlb sysfs gets initialized from hugetlb_init() a few
> calls after, so.. meh.

hugetlb_vmemmap_init() is introduced by me. I want to hide some symbols
from mm/hugetlb_vmemmap.c at that time. So I used late_initcall() for that.

> 
> Anyway, maybe just convert hugetlb_vmemmap_init to 
> hugetlb_vmemmap_sysfs_create
> (pick a better name), so it does not look like having two entry points
> for vmemmap init stuff.

This bug fix is a temporary change, the whole zone initialization will
removed later in patch 30. Then, there will be only one place for vmemmap init 
:)

I’m curious about your thoughts on the practice of minimizing external
symbols. Personally, I’d prefer to move these initializations into
their respective files:

    - hugetlb_sysfs_init();
    - hugetlb_cgroup_file_init();
    - hugetlb_sysctl_init();

Of course, that’s just my personal preference.

Muchun,
Thanks.


> 
> -- 
> Oscar Salvador
> SUSE Labs



Reply via email to