On Tue, 14 Oct 2025 15:04:14 +0200 Christian König <[email protected]> wrote:
> On 14.10.25 14:44, Zhaoyang Huang wrote: > > On Tue, Oct 14, 2025 at 7:59 PM Christian König > > <[email protected]> wrote: > >> > >> On 14.10.25 10:32, zhaoyang.huang wrote: > >>> From: Zhaoyang Huang <[email protected]> > >>> > >>> The size of once dma-buf allocation could be dozens MB or much more > >>> which introduce a loop of allocating several thousands of order-0 pages. > >>> Furthermore, the concurrent allocation could have dma-buf allocation enter > >>> direct-reclaim during the loop. This commit would like to eliminate the > >>> above two affections by introducing alloc_pages_bulk_list in dma-buf's > >>> order-0 allocation. This patch is proved to be conditionally helpful > >>> in 18MB allocation as decreasing the time from 24604us to 6555us and no > >>> harm when bulk allocation can't be done(fallback to single page > >>> allocation) > >> > >> Well that sounds like an absolutely horrible idea. > >> > >> See the handling of allocating only from specific order is *exactly* there > >> to avoid the behavior of bulk allocation. > >> > >> What you seem to do with this patch here is to add on top of the behavior > >> to avoid allocating large chunks from the buddy the behavior to allocate > >> large chunks from the buddy because that is faster. > > emm, this patch doesn't change order-8 and order-4's allocation > > behaviour but just to replace the loop of order-0 allocations into > > once bulk allocation in the fallback way. What is your concern about > > this? > > As far as I know the bulk allocation favors splitting large pages into > smaller ones instead of allocating smaller pages first. That's where the > performance benefit comes from. > > But that is exactly what we try to avoid here by allocating only certain > order of pages. This is a good question, actually. Yes, bulk alloc will split large pages if there are insufficient pages on the pcp free list. But is dma-buf indeed trying to avoid it, or is it merely using an inefficient API? And does it need the extra speed? Even if it leads to increased fragmentation? Petr T
