On Tue, Feb 24, 2026 at 01:22:36PM +0100, Michal Hocko wrote:
> On Tue 24-02-26 12:39:33, Uladzislau Rezki wrote:
> > On Mon, Feb 23, 2026 at 11:08:02PM +0100, Michal Hocko wrote:
> > > On Mon 23-02-26 20:25:38, Mikulas Patocka wrote:
> > > > 
> > > > 
> > > > On Mon, 23 Feb 2026, Vishal Moola (Oracle) wrote:
> > > > 
> > > > > On Thu, Feb 12, 2026 at 05:33:30PM +0100, Mikulas Patocka wrote:
> > > > > > The commit 07003531e03c8 ("mm/vmalloc: warn on invalid vmalloc gfp
> > > > > > flags") breaks the device mapper VDO target. The VDO target calls 
> > > > > > vmalloc
> > > > > > with __GFP_RETRY_MAYFAIL and this flag is not in the mask of allowed
> > > > > > flags.
> > > > > > 
> > > > > > There is no reason why vmalloc couldn't support 
> > > > > > __GFP_RETRY_MAYFAIL, so 
> > > > > > let's add this flag to GFP_VMALLOC_SUPPORTED.
> > > > > 
> > > > > My only skepticism about this comes from the line in the
> > > > > vmalloc_node_range() doc: 
> > > > > "and %__GFP_RETRY_MAYFAIL are not supported."
> > > > > 
> > > > > I myself don't know why that may be. Could you elaborate on if/why the
> > > > > doc is wrong please?
> > > > 
> > > > This statement was added by Michal Hocko in the commit 
> > > > b7d90e7a5ea8d64e668d5685925900d33d3884d5. Michal, could you explain why 
> > > > do 
> > > > you think that __GFP_RETRY_MAYFAIL is not supported?
> > > 
> > > The problem with __GFP_RETRY_MAYFAIL is that it cannot be fully
> > > supported. While pages that back the allocation can be easily made aware
> > > of this failure mode there are page table allocations which are
> > > hardocded GFP_KERNEL and there is no sensible way to extend the API to
> > > change that (as we have learned several time over years).
> > > 
> > > > The VDO module needs to allocate large amounts of memory and it doesn't 
> > > > want to trigger the OOM killer (which would kill some innocent task and 
> > > > wouldn't solve the out of memory condition at all), so I think that 
> > > > __GFP_RETRY_MAYFAIL is appropriate.
> > > 
> > > Understood. But as said the very page table allocation could be the
> > > trigger for the unwanted OOM. The same applies to __GFP_NORETRY
> > > unfortunately as well. vmalloc has just recently gained support of
> > > GFP_NOWAIT allocation mode, though. This will make the allocation
> > > failure much more likely though so I am not entirely sure this is a
> > > proper solution for your problem.
> > >
> > Yes, the page-table manipulation entries are hard-coded and it looks
> > like it is the last path which is not wired properly with gfp-flags.
> > 
> > Since we grow PTEs and never release it might not be a big issue for
> > the __GFP_RETRY_MAYFAIL usage. But it is still not valid in noted path.
> 
> One thing that we could do to improve __GFP_RETRY_MAYFAIL resp.
> __GFP_NORETRY is to use NOWAIT allocation semantic for page table
> allocations as those could be achieved by scoped allocation context.
> This could cause pre-mature failure after the whole bunch of memory has
> already been allocated for the backing pages but considering that page
> table allocations should be more and more rare over system runtime it
> might be just a reasonable workaround. WDYT?
>
As far as i understand, Mikulas uses __GFP_RETRY_MAYFAIL with vmalloc
for some time already. I have not seen any reports about that a PTE/xxx
alloc path triggers OOM killer thus i tend to say that it is not easy
to trigger.

I do not have a strong opinion about workaround you noted. Maybe Mikulas
can switch to NOWAIT flag instead.

--
Uladzislau Rezki

Reply via email to