On Tue, Mar 10, 2026, Ackerley Tng wrote:
> Sean Christopherson <[email protected]> writes:
> > On Mon, Mar 09, 2026, Ackerley Tng wrote:
> > But even if that's somehow the "right" behavior, we're doing it purely by
> > accident.
> >
> > As for this patch, if we fix that bug by returning 0, then 
> > filemap_release_folio()
> > is definitely reachable by at least one flow, so I think guest_memfd also 
> > needs
> > to implement release_folio()?
> >
> 
> Is posix_fadvise() the one flow you're talking about?

No, I'm saying if we fix the memory error case, then filemap_release_folio()
likely becomes reachable.  Though there may be other cases.

> It indeed calls filemap_release_folio() through mapping_try_invalidate()
> -> mapping_evict_folio() -> filemap_release_folio().
> 
> >From Documentation/filesystems/locking.rst:
> 
>   ->release_folio() is called when the MM wants to make a change to the
>   folio that would invalidate the filesystem's private data.  For example,
>   it may be about to be removed from the address_space or split.  The folio
>   is locked and not under writeback.  It may be dirty.  The gfp parameter
>   is not usually used for allocation, but rather to indicate what the
>   filesystem may do to attempt to free the private data.  The filesystem may
>   return false to indicate that the folio's private data cannot be freed.
>   If it returns true, it should have already removed the private data from
>   the folio.  If a filesystem does not provide a ->release_folio method,
>   the pagecache will assume that private data is buffer_heads and call
>   try_to_free_buffers().
> 
> I could implement .release_folio().
> 
> Returning false seems like the easier solution, and is kind of in line
> with the documentation above. A guest_memfd folio does not have private
> data, so without private data, the private data cannot be freed.

Eh, not really, If there's no private data, then freeing it always succeeds.

> (Took me a while to notice that having private data is not the same
> as having something in folio->private, so this doesn't change even after
> the direct map removal series lands.)
> 
> Returning false is going to break shrink_folio_list(), but that probably
> won't affect guest_memfd for now.

Definitely not a problem, I'm very against putting guest_memfd pages on the
kernel's standard LRU lists.

> Returning false also breaks page_cache_pipe_buf_try_steal(). Does anyone
> more familiar with splicing know if that could affect guest_memfd?

AFAICT, also not a problem until KVM supports .splice_read().

> Returning true could also work, to indicate that the folio's private
> data has been "removed". I'd also have to do inode_sub_bytes() in
> .release_folio() then, since in mapping_evict_folio(), remove_mapping()
> doesn't call .invalidate_folio().
> 
> Then we will have to separately ensure that in truncate_error_folio(),
> guest_memfd doesn't double-deduct the folio's size from the inode. This
> should be semantically correct though, since IIUC .invalidate_folio() is
> when a folio is removed (clean or dirty), but .release_folio() is only
> for clean folios. If .error_remove_folio() returns MF_DELAYED, the
> truncation didn't happen and so there should be no call to
> .release_folio().

Before we dive deep into solutions, what's the motivation for making fstat() 
work?
As I asked in the cover letter:

  P.S. In future versions, please explain _why_ you want to add fstat() support,
  i.e. why you want to account allocated bytes/folios.  For folks like me that 
do
  very little userspace programming, and even less filesystems work, fstat() not
  working means nothing.  Even if the answer is "because literally every other 
FS
  in Linux works".

Reply via email to