On Tue, Jun 23, 2026 at 3:05 PM Ritesh Harjani <[email protected]> wrote: > > Barry Song <[email protected]> writes: > > > On Fri, Jun 19, 2026 at 12:41 PM Ritesh Harjani (IBM) > > <[email protected]> wrote: > >> > >> SWAP_NR_ORDERS sizes a few small bounded arrays inside THP swap > >> allocator code (nofull/frag cluster lists, percpu_swap_cluster's > >> si/offset arrays, next array for rotational device). This currently > >> expands to PMD_ORDER+1, which only works when PMD_ORDER is a compile > >> time constant. > >> > >> However on architecture like PowerPC Book3S64, PMD_ORDER is a runtime > >> variable which depends upon which MMU is selected (Radix / Hash), so in > >> that case, PMD_ORDER cannot be used to size the static arrays. > >> > >> This patch provides an optional ARCH_MAX_PMD_ORDER (upper-bound) > >> override for such architectures. The memory overhead on enabling this > >> override is negligible. Even if we make SWAP_NR_ORDERS runtime alloc, > >> default slab padding could cause some memory waste. Also we lose the > >> per-cpu cacheline benefits (for percpu_swap_cluster) because it might > >> cost an extra cacheline indirection overhead in swap_alloc_fast() for > >> fetching si[order]/offset[order]. Note that a fully runtime > >> SWAP_NR_ORDERS was considered in previous version but was dropped for > >> this reason [1] > > > > Do we know the maximum PMD size? > > ARCH_MAX_PMD_ORDER will be 8 on PowerPC book3s64 with 64K pagesize. > PowerPC Hash MMU with 64K default pagesize supports PMD size of 16MB. > > > On arm64 with a 64 KB base page, > > a PMD can be as large as 512 MB: > > https://docs.kernel.org/arch/arm64/hugetlbpage.html > > > > One concern we have is that performing I/O on such a large folio could > > incur significant latency before reclaiming any memory. For this > > reason, on arm64 we initially enabled THP_SWAPOUT only for 4 KB base > > pages: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0637c505f > > > > That's not the case on PowerPC. Max PMD size for Hash will be 16MB.
Yep. A 16 MB folio might be fine, although I'm not sure whether splitting a 16 MB folio into eight 2 MB folios would help much. For 512 MB PMD-sized pages on arm64, one possible approach might be to split them into 256 × 2 MB folios rather than all the way down to 4 KB pages. That could provide a better balance between I/O latency and swap performance. > Also we still need this patch since we can at runtime choose Hash or > Radix MMU. So, the main problem this patch is trying to solve on PowerPC > Book3s64 is enabling this feature w/o impacting any other architecture. > W/O this patch series, we can't enable it, since it gives build errors. I see. If possible, please mention in the changelog that the maximum PMD size on your platform is 16 MB. In that case, the I/O latency concerns I raised may not really apply. w/ that, please free feel to add: Reviewed-by: Barry Song <[email protected]>
