On Tue, May 26, 2026 at 01:10:55AM +0000, Jaegeuk Kim wrote: > Background > ---------- > The primary use case is accelerating AI model loading, which demands > exceptionally high sequential read speeds. In our benchmarks on embedded > systems: > - Using high-order page allocations allows the system to saturate the > Universal Flash Storage (UFS) bandwidth, reaching 4 GB/s even at > medium-to-low CPU frequencies. > - In contrast, standard small folios cap performance at 2 GB/s. > > The performance doubling stems directly from reducing CPU cycle overhead > during > memory allocation.
When you say "AI model loading", are you mmap()ing the file of weights, or are you calling read() to load the file into anonymous memory? This matters because for the first operation, you need to allocate folios of PMD size in order to make best use of TLB entries. For the second operation, it's more important to iterate through the file quickly, freeing folios behind you after you access them so they're available for the next batch. _______________________________________________ Linux-f2fs-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
