On 6/18/26 13:13, Gregory Price wrote:
> On Thu, Jun 18, 2026 at 10:21:30AM +0200, Vlastimil Babka (SUSE) wrote:
>> On 6/15/26 17:37, Gregory Price wrote:
>> >
>> > One thought would be a way to switch what fallback list is used, and
>> > then have specific fallback lists for certain contexts.
>> >
>> > Right now there is a single example of this: __GFP_THISNODE
>> > |= __GFP_THISNODE => NOFALLBACK
>> > &= ~__GFP_THISNODE => FALLBACK
>> >
>> > We could add an interface with the desired fallback list based as an
>> > argument, and let get_page_from_freelist to prefer that over the default
>> > global lists.
>>
>> Does it mean a new argument in a number of functions in the page allocator,
>> or can it be mapped to alloc_flags (at least internally?), because the
>> number of possible fallback lists is small enough?
>>
>
> What I ended up with was adding a single page_alloc.c external interface
> that allows you define the zonelist via an enum, and then an internal
> selector resolution in prepare_alloc_pages() stored in alloc_context
OK. Since it's in alloc_context then there should be no parameter bloat
inside page allocator. And for the single external entry point it's better
to be explicit.
>
> eg:
>
> static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> struct alloc_context *ac, gfp_t *alloc_gfp,
> unsigned int *alloc_flags)
> {
> ac->highest_zoneidx = gfp_zone(gfp_mask);
> ac->zonelist = select_zonelist(preferred_nid, gfp_mask, ac->zlsel);
> ... snip ...
> }
>
> struct folio *__folio_alloc_zonelist_noprof(gfp_t gfp, unsigned int order,
> int preferred_nid, nodemask_t *nodemask,
> enum alloc_zonelist zlsel);
>
>
> The original __folio_alloc* functions just add a DEFAULT - which tells
> select_zonelist() to base the decision on __GFP_THISNODE.
>
>
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int
> preferred_nid,
> nodemask_t *nodemask)
> {
> return __folio_alloc_core(gfp, order, preferred_nid, nodemask,
> ALLOC_ZONELIST_DEFAULT);
> }
> EXPORT_SYMBOL(__folio_alloc_noprof);
>
>
> This does a few things
> - The isolation is structural, there is no way to accidentally
> allocate private memory without passing ALLOC_ZONELIST_PRIVATE
>
> - The isolation forces folios - there are no non-folio interfaces
> which allow zonelist selection
>
> - The zonelist selection is confined to this allocation context,
> so no inheritence is possible.
>
Ack.
>
> I tried to avoid using an ALLOC_ flag so we can avoid yet another flag
> crunch, but there certainly are few enough zonelists that we could
> encode it there and expose it. I know Brendan was looking at plumbing
> alloc flags out to an interface, so i'm open to that.
>
> Externally the way I determine what zonelist to use is a lookup based on
> reason - letting the node filter. This is really only needed in a
> couple spots:
>
> mm/khugepaged.c: enum alloc_zonelist zlsel = alloc_zonelist_for_node(node,
> NODE_ALLOC_RECLAIM);
> mm/vmscan.c: mtc->zlsel = alloc_zonelist_for_nodemask(mtc->nmask,
> NODE_ALLOC_TIERING);
> mm/migrate.c: .zlsel = alloc_zonelist_for_node(node,
> NODE_ALLOC_USER_MIGRATE),
>
> static inline enum alloc_zonelist
> alloc_zonelist_for_node(int nid, enum node_alloc_reason reason)
> {
> bool ok;
>
> if (!node_state(nid, N_MEMORY_PRIVATE))
> return ALLOC_ZONELIST_DEFAULT;
> switch (reason) {
> case NODE_ALLOC_RECLAIM:
> ok = node_is_reclaimable(nid);
> break;
> case NODE_ALLOC_TIERING:
> ok = node_allows_tiering(nid);
> break;
> case NODE_ALLOC_USER_MIGRATE:
> ok = node_allows_user_migrate(nid);
> break;
> default:
> ok = false;
> }
> return ok ? ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> }
>
> Otherwise... everything is now a mempolicy w/ MPOL_F_BIND and all the
> handling goes through the normal fault-paths :]
>
> static struct page *__alloc_pages_mpol(gfp_t gfp, unsigned int order,
> struct mempolicy *pol, pgoff_t ilx, int nid)
> {
> nodemask_t *nodemask;
> struct page *page;
> enum alloc_zonelist zlsel = (pol->flags & MPOL_F_PRIVATE) ?
> ALLOC_ZONELIST_PRIVATE : ALLOC_ZONELIST_DEFAULT;
> ...
> if (pol->mode == MPOL_PREFERRED_MANY)
> return alloc_pages_preferred_many(gfp, order, nid, nodemask,
> zlsel);
> ...
> }
>
>
> Switching to an alloc_flag would probably be trivially if that's really
> wanted
I guess not. Thanks for the explanation!
> ~Gregory