On 6/18/26 13:13, Gregory Price wrote:
> On Thu, Jun 18, 2026 at 10:21:30AM +0200, Vlastimil Babka (SUSE) wrote:
>> On 6/15/26 17:37, Gregory Price wrote:
>> > 
>> > One thought would be a way to switch what fallback list is used, and
>> > then have specific fallback lists for certain contexts.
>> > 
>> > Right now there is a single example of this: __GFP_THISNODE
>> >   |= __GFP_THISNODE   =>  NOFALLBACK
>> >   &= ~__GFP_THISNODE  =>  FALLBACK
>> > 
>> > We could add an interface with the desired fallback list based as an
>> > argument, and let get_page_from_freelist to prefer that over the default
>> > global lists.
>> 
>> Does it mean a new argument in a number of functions in the page allocator,
>> or can it be mapped to alloc_flags (at least internally?), because the
>> number of possible fallback lists is small enough?
>>
> 
> What I ended up with was adding a single page_alloc.c external interface
> that allows you define the zonelist via an enum, and then an internal
> selector resolution in prepare_alloc_pages() stored in alloc_context

OK. Since it's in alloc_context then there should be no parameter bloat
inside page allocator. And for the single external entry point it's better
to be explicit.

> 
> eg:
> 
> static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
>                 int preferred_nid, nodemask_t *nodemask,
>                 struct alloc_context *ac, gfp_t *alloc_gfp,
>                 unsigned int *alloc_flags)
> {       
>         ac->highest_zoneidx = gfp_zone(gfp_mask);
>         ac->zonelist = select_zonelist(preferred_nid, gfp_mask, ac->zlsel);
>       ... snip ...
> }
> 
> struct folio *__folio_alloc_zonelist_noprof(gfp_t gfp, unsigned int order,
>                 int preferred_nid, nodemask_t *nodemask,
>                 enum alloc_zonelist zlsel);
> 
> 
> The original __folio_alloc* functions just add a DEFAULT - which tells
> select_zonelist() to base the decision on __GFP_THISNODE.
> 
> 
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int 
> preferred_nid,
>                 nodemask_t *nodemask)
> {
>         return __folio_alloc_core(gfp, order, preferred_nid, nodemask,
>                                   ALLOC_ZONELIST_DEFAULT);
> }
> EXPORT_SYMBOL(__folio_alloc_noprof);
> 
> 
> This does a few things
>   - The isolation is structural, there is no way to accidentally
>     allocate private memory without passing ALLOC_ZONELIST_PRIVATE
> 
>   - The isolation forces folios - there are no non-folio interfaces
>     which allow zonelist selection
> 
>   - The zonelist selection is confined to this allocation context,
>     so no inheritence is possible.
> 

Ack.

> 
> I tried to avoid using an ALLOC_ flag so we can avoid yet another flag
> crunch, but there certainly are few enough zonelists that we could
> encode it there and expose it.  I know Brendan was looking at plumbing
> alloc flags out to an interface, so i'm open to that.
> 
> Externally the way I determine what zonelist to use is a lookup based on
> reason - letting the node filter.  This is really only needed in a
> couple spots:
> 
> mm/khugepaged.c:  enum alloc_zonelist zlsel = alloc_zonelist_for_node(node, 
> NODE_ALLOC_RECLAIM);
> mm/vmscan.c:      mtc->zlsel = alloc_zonelist_for_nodemask(mtc->nmask, 
> NODE_ALLOC_TIERING);
> mm/migrate.c:     .zlsel = alloc_zonelist_for_node(node, 
> NODE_ALLOC_USER_MIGRATE),
> 
> static inline enum alloc_zonelist
> alloc_zonelist_for_node(int nid, enum node_alloc_reason reason)
> {
>         bool ok;
> 
>         if (!node_state(nid, N_MEMORY_PRIVATE))
>                 return ALLOC_ZONELIST_DEFAULT;
>         switch (reason) {
>         case NODE_ALLOC_RECLAIM:
>                 ok = node_is_reclaimable(nid);
>                 break;
>         case NODE_ALLOC_TIERING:
>                 ok = node_allows_tiering(nid);
>                 break;
>         case NODE_ALLOC_USER_MIGRATE:
>                 ok = node_allows_user_migrate(nid);
>                 break;
>         default:
>                 ok = false;
>         }
>         return ok ? ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> }
> 
> Otherwise... everything is now a mempolicy w/ MPOL_F_BIND and all the
> handling goes through the normal fault-paths :]
> 
> static struct page *__alloc_pages_mpol(gfp_t gfp, unsigned int order,
>                 struct mempolicy *pol, pgoff_t ilx, int nid)
> {
>         nodemask_t *nodemask;
>         struct page *page;
>         enum alloc_zonelist zlsel = (pol->flags & MPOL_F_PRIVATE) ?
>                 ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> ...
>         if (pol->mode == MPOL_PREFERRED_MANY)
>                 return alloc_pages_preferred_many(gfp, order, nid, nodemask,
>                                                   zlsel);
> ...
> }
> 
> 
> Switching to an alloc_flag would probably be trivially if that's really
> wanted

I guess not. Thanks for the explanation!

> ~Gregory


Reply via email to