On Tue, Jun 23, 2026 at 3:12 PM Ritesh Harjani <[email protected]> wrote: > > Barry Song <[email protected]> writes: > > > On Fri, Jun 19, 2026 at 12:41 PM Ritesh Harjani (IBM) > > <[email protected]> wrote: > >> > >> THP_SWAP avoids splitting of a transparent huge folio into 32 smaller > >> 64K folios (Radix-64K pagesize / 2M PMD) or into 256 smaller 64K folios > >> (Hash-64K pagesize / 16M PMD), during swapout. This improves the > >> swapping performance since all the bookking & I/O submission happens > >> once per large folio. More details at [1]. > >> > >> PowerPC Book3S64 could not enable this before because PMD_ORDER is > >> selected at runtime depending upon the chosen MMU. The earlier patches > >> in this series turn SWAPFILE_CLUSTER into a runtime value and introduce > >> an ARCH_MAX_PMD_ORDER upperbound override for SWAP_NR_ORDERS. With those > >> changes, we can now enable THP SWAP for Book3S64. > >> > >> This increases bandwidth throughput with zram backend for swapout by > >> 40-50% with Radix and 100-130% with Hash (Tested by Sayali) > > > > Thanks! > > > > I am curious about the contents of the anonymous memory being tested > > and the compression algorithm used by zram. > > > > I am sure it was derived from your microbenchmark itself which you had > shared here (so repetitive pattern) with default zram compression > algorithm. Thanks for that :) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0637c505f > > I think I got your point - I can mention that it was a microbenchmark > similar to yours and not a real world workload test. Is this what you > meant here?
Yep. Please make it clear in the changelog what kind of workload was used, as different data can result in completely different compression ratios and compression/decompression costs. Consequently, the reported swap-out and swap-in performance improvements can vary significantly as well. w/ that, please feel free to add: Reviewed-by: Barry Song <[email protected]>
