On 05/28, Matthew Wilcox wrote: > On Tue, May 26, 2026 at 01:10:55AM +0000, Jaegeuk Kim wrote: > > Background > > ---------- > > The primary use case is accelerating AI model loading, which demands > > exceptionally high sequential read speeds. In our benchmarks on embedded > > systems: > > - Using high-order page allocations allows the system to saturate the > > Universal Flash Storage (UFS) bandwidth, reaching 4 GB/s even at > > medium-to-low CPU frequencies. > > - In contrast, standard small folios cap performance at 2 GB/s. > > > > The performance doubling stems directly from reducing CPU cycle overhead > > during > > memory allocation. > > When you say "AI model loading", are you mmap()ing the file of weights, > or are you calling read() to load the file into anonymous memory? > > This matters because for the first operation, you need to allocate folios > of PMD size in order to make best use of TLB entries. For the second > operation, it's more important to iterate through the file quickly, > freeing folios behind you after you access them so they're available > for the next batch.
We deal with multiple options tho, what I'm looking at is mostly a preloading models by mmap(MAP_POPULATE) which takes the readahead path bumping up the order by 2. Previously I also looked at fadvise(WILLNEED), but gave up due to the broken interface. OTOH, we use RWF_DONTCACHE for read() case, but I don't think it's ideal for the best loading performance. > > > _______________________________________________ > Linux-f2fs-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel _______________________________________________ Linux-f2fs-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
