Christophe Leroy <[email protected]> writes:

> Le 10/11/2025 à 12:27, David Hildenbrand (Red Hat) a écrit :
>> Thanks for the review!
>> 
>>>
>>> So I think what you want instead is:
>>>
>>> diff --git a/arch/powerpc/platforms/Kconfig.cputype
>>> b/arch/powerpc/platforms/Kconfig.cputype
>>> index 7b527d18aa5ee..1f5a1e587740c 100644
>>> --- a/arch/powerpc/platforms/Kconfig.cputype
>>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>>> @@ -276,6 +276,7 @@ config PPC_E500
>>>           select FSL_EMB_PERFMON
>>>           bool
>>>           select ARCH_SUPPORTS_HUGETLBFS if PHYS_64BIT || PPC64
>>> +       select ARCH_HAS_GIGANTIC_PAGE if ARCH_SUPPORTS_HUGETLBFS
>>>           select PPC_SMP_MUXED_IPI
>>>           select PPC_DOORBELL
>>>           select PPC_KUEP
>>>
>>>
>>>
>>>>        select ARCH_HAS_KCOV
>>>>        select ARCH_HAS_KERNEL_FPU_SUPPORT    if PPC64 && PPC_FPU
>>>>        select ARCH_HAS_MEMBARRIER_CALLBACKS
>>>> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/
>>>> platforms/Kconfig.cputype
>>>> index 7b527d18aa5ee..4c321a8ea8965 100644
>>>> --- a/arch/powerpc/platforms/Kconfig.cputype
>>>> +++ b/arch/powerpc/platforms/Kconfig.cputype
>>>> @@ -423,7 +423,6 @@ config PPC_64S_HASH_MMU
>>>>    config PPC_RADIX_MMU
>>>>        bool "Radix MMU Support"
>>>>        depends on PPC_BOOK3S_64
>>>> -    select ARCH_HAS_GIGANTIC_PAGE
>>>
>>> Should remain I think.
>>>
>>>>        default y
>>>>        help
>>>>          Enable support for the Power ISA 3.0 Radix style MMU. Currently
>> 
>> 
>> We also have PPC_8xx do a
>> 
>>      select ARCH_SUPPORTS_HUGETLBFS
>> 
>> And of course !PPC_RADIX_MMU (e.g., PPC_64S_HASH_MMU) through 
>> PPC_BOOK3S_64.
>> 
>> Are we sure they cannot end up with gigantic folios through hugetlb?
>> 
>
> Yes indeed. My PPC_8xx is OK because I set CONFIG_ARCH_FORCE_MAX_ORDER=9 
> (largest hugepage is 8M) but I do get the warning with the default value 
> which is 8 (with 16k pages).
>
> For PPC_64S_HASH_MMU, max page size is 16M, we get no warning with 
> CONFIG_ARCH_FORCE_MAX_ORDER=8 which is the default value but get the 
> warning with CONFIG_ARCH_FORCE_MAX_ORDER=7
>

This made me thinking.. Currently we can also get warning even on
book3s64 when CONFIG_PPC_RADIX_MMU=n is selected because max page size
in case of HASH can be 16G. I guess this was not getting tested in
regular CI because it requires us to disable RADIX config during build.

We will end up in this path on Hash where MAX_PAGE_ORDER is
CONFIG_ARCH_FORCE_MAX_ORDER which is 8, this is because we HAVE
ARCH_HAS_GIGANTIC_PAGE=n in case of only HASH.

>From below, MAX_FOLIO_ORDER on !PPC_RADIX_MMU (HASH) becomes 8 i.e... 

    #if !defined(CONFIG_ARCH_HAS_GIGANTIC_PAGE)
    /*
    * We don't expect any folios that exceed buddy sizes (and consequently
    * memory sections).
    */
    #define MAX_FOLIO_ORDER             MAX_PAGE_ORDER

...And thus 
we get similar warning because (order=18 for 16G) > MAX_FOLIO_ORDER(8) in 
hugetlb_add_hstate().

[    0.000000] Kernel command line: console=hvc0 console=hvc1 
systemd.unit=emergency.target root=/dev/vda1 noreboot disable_radix=1 
hugepagesz=16M hugepages=1 hugepagesz=16G hugepages=1 default_hugepagesz=16G
<...>
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at mm/hugetlb.c:4753 
hugetlb_add_hstate+0xf4/0x228
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 
6.18.0-rc3-00138-g1e87cdb8702c #26 NONE
[    0.000000] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 
opal:v7.1-106-g785a5e307 PowerNV
[    0.000000] NIP:  c00000000204ef4c LR: c00000000204f1b0 CTR: c00000000204ee68
[    0.000000] REGS: c000000002857ad0 TRAP: 0700   Not tainted  
(6.18.0-rc3-00138-g1e87cdb8702c)
[    0.000000] MSR:  9000000002021033 <SF,HV,VEC,ME,IR,DR,RI,LE>  CR: 28000448  
XER: 00000000
[    0.000000] CFAR: c00000000204eed8 IRQMASK: 3
<...>
[    0.000000] NIP [c00000000204ef4c] hugetlb_add_hstate+0xf4/0x228
[    0.000000] LR [c00000000204f1b0] hugepagesz_setup+0x130/0x16c
[    0.000000] Call Trace:
[    0.000000] [c000000002857d70] [c0000000020ee564] 
hstate_cmdline_buf+0x4/0x800 (unreliable)
[    0.000000] [c000000002857e10] [c00000000204f1b0] 
hugepagesz_setup+0x130/0x16c
[    0.000000] [c000000002857e80] [c0000000020505a8] 
hugetlb_bootmem_alloc+0xd8/0x1d0
[    0.000000] [c000000002857ec0] [c000000002046828] mm_core_init+0x2c/0x254
[    0.000000] [c000000002857f30] [c0000000020012ac] start_kernel+0x404/0xae0
[    0.000000] [c000000002857fe0] [c00000000000e934] start_here_common+0x1c/0x20
<...>
[    2.557050] HugeTLB: allocation took 7ms with hugepage_allocation_threads=1
[    2.562263] ------------[ cut here ]------------
[    2.564482] WARNING: CPU: 0 PID: 1 at mm/internal.h:758 
gather_bootmem_prealloc_parallel+0x454/0x4d8
[    2.568266] Modules linked in:
[    2.570204] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W         
  6.18.0-rc3-00138-g1e87cdb8702c #26 NONE
[    2.574570] Tainted: [W]=WARN
[    2.576009] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 
opal:v7.1-106-g785a5e307 PowerNV
[    2.579979] NIP:  c00000000204f9b0 LR: c00000000204f870 CTR: c00000000204f55c
[    2.582763] REGS: c000000004a0f5a0 TRAP: 0700   Tainted: G        W          
  (6.18.0-rc3-00138-g1e87cdb8702c)
[    2.586670] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 
44002288  XER: 20040000
[    2.590234] CFAR: c00000000204f880 IRQMASK: 0
<...>
[    2.616926] NIP [c00000000204f9b0] 
gather_bootmem_prealloc_parallel+0x454/0x4d8
[    2.619928] LR [c00000000204f870] 
gather_bootmem_prealloc_parallel+0x314/0x4d8
[    2.622799] Call Trace:
[    2.624068] [c000000004a0f840] [c00000000204f85c] 
gather_bootmem_prealloc_parallel+0x300/0x4d8 (unreliable)
[    2.627847] [c000000004a0f930] [c000000002041018] 
padata_do_multithreaded+0x470/0x518
[    2.631141] [c000000004a0fad0] [c00000000204fce8] hugetlb_init+0x2b4/0x904
[    2.633914] [c000000004a0fc10] [c000000000010d74] do_one_initcall+0xac/0x438
[    2.636761] [c000000004a0fcf0] [c000000002001dfc] 
kernel_init_freeable+0x3cc/0x720
[    2.639764] [c000000004a0fde0] [c000000000011344] kernel_init+0x34/0x260
[    2.642688] [c000000004a0fe50] [c00000000000debc] 
ret_from_kernel_user_thread+0x14/0x1c
[    2.646020] ---- interrupt: 0 at 0x0
[    2.647943] Code: eba100d8 ebc100e0 ebe100e8 e9410058 e92d0c70 7d4a4a79 
39200000 40820044 382100f0 eaa1ffa8 4e800020 60420000 <0fe00000> 4bfffed0 
3ba00000 7ee4bb78
[    2.654240] irq event stamp: 50400
[    2.655991] hardirqs last  enabled at (50399): [<c00000000002ed84>] 
interrupt_exit_kernel_prepare+0xd8/0x224
[    2.659759] hardirqs last disabled at (50400): [<c00000000002bdb8>] 
program_check_exception+0x60/0x78
[    2.663293] softirqs last  enabled at (50320): [<c00000000017aa0c>] 
handle_softirqs+0x5a8/0x5c0
[    2.666819] softirqs last disabled at (50315): [<c0000000000165e4>] 
do_softirq_own_stack+0x40/0x54
[    2.670569] ---[ end trace 0000000000000000 ]---
[    2.697258] HugeTLB: registered 16.0 MiB page size, pre-allocated 1 pages
[    2.700831] HugeTLB: 0 KiB vmemmap can be freed for a 16.0 MiB page
[    2.703917] HugeTLB: registered 16.0 GiB page size, pre-allocated 1 pages
[    2.707073] HugeTLB: 0 KiB vmemmap can be freed for a 16.0 GiB page


So I guess making PPC select ARCH_HAS_GIGANTIC_PAGE if ARCH_SUPPORTS_HUGETLBFS 
is true,
should help us resolve this warning w.r.t order. 
And I guess the runtime allocation of gigantic pages is anyway being controlled
via, __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED

Feel free to correct me here if I missed anything. There seems to be a
lot of history related to hugetlb / gigantic pages.

-ritesh

Reply via email to