On Fri, Jun 19, 2026 at 12:41 PM Ritesh Harjani (IBM) <[email protected]> wrote: > > SWAP_NR_ORDERS sizes a few small bounded arrays inside THP swap > allocator code (nofull/frag cluster lists, percpu_swap_cluster's > si/offset arrays, next array for rotational device). This currently > expands to PMD_ORDER+1, which only works when PMD_ORDER is a compile > time constant. > > However on architecture like PowerPC Book3S64, PMD_ORDER is a runtime > variable which depends upon which MMU is selected (Radix / Hash), so in > that case, PMD_ORDER cannot be used to size the static arrays. > > This patch provides an optional ARCH_MAX_PMD_ORDER (upper-bound) > override for such architectures. The memory overhead on enabling this > override is negligible. Even if we make SWAP_NR_ORDERS runtime alloc, > default slab padding could cause some memory waste. Also we lose the > per-cpu cacheline benefits (for percpu_swap_cluster) because it might > cost an extra cacheline indirection overhead in swap_alloc_fast() for > fetching si[order]/offset[order]. Note that a fully runtime > SWAP_NR_ORDERS was considered in previous version but was dropped for > this reason [1]
Do we know the maximum PMD size? On arm64 with a 64 KB base page, a PMD can be as large as 512 MB: https://docs.kernel.org/arch/arm64/hugetlbpage.html One concern we have is that performing I/O on such a large folio could incur significant latency before reclaiming any memory. For this reason, on arm64 we initially enabled THP_SWAPOUT only for 4 KB base pages: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0637c505f > > [1]: https://lore.kernel.org/linuxppc-dev/[email protected]/ > Best Regards Barry
