Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode

David Hildenbrand (Arm) Tue, 24 Feb 2026 07:27:46 -0800

On 2/24/26 00:42, Ackerley Tng wrote:
> "David Hildenbrand (Arm)" <[email protected]> writes:
> 
>> On 2/23/26 08:04, Ackerley Tng wrote:
>>> Hi,
>>>
>>> Currently, guest_memfd doesn't update inode's i_blocks or i_bytes at
>>> all. Hence, st_blocks in the struct populated by a userspace fstat()
>>> call on a guest_memfd will always be 0. This patch series makes
>>> guest_memfd track the amount of memory allocated on an inode, which
>>> allows fstat() to accurately report that on requests from userspace.
>>>
>>> The inode's i_blocks and i_bytes fields are updated when the folio is
>>> associated or disassociated from the guest_memfd inode, which are at
>>> allocation and truncation times respectively.
>>>
>>> To update inode fields at truncation time, this series implements a
>>> custom truncation function for guest_memfd. An alternative would be to
>>> update truncate_inode_pages_range() to return the number of bytes
>>> truncated or add/use some hook.
>>>
>>> Implementing a custom truncation function was chosen to provide
>>> flexibility for handling truncations in future when guest_memfd
>>> supports sources of pages other than the buddy allocator. This
>>> approach of a custom truncation function also aligns with shmem, which
>>> has a custom shmem_truncate_range().
>>
>> Just wondered how shmem does it: it's through
>> dquot_alloc_block_nodirty() / dquot_free_block_nodirty().
>>
>> It's a shame we can't just use folio_free().
> 
> Yup, Hugh pointed out that struct address_space *mapping (and inode) may 
> already
> have been freed by the time .free_folio() is called [1].
> 
> [1] 
> https://lore.kernel.org/all/[email protected]/
> 
>> Could we maybe have a
>> different callback (when the mapping is still guaranteed to be around)
>> from where we could update i_blocks on the freeing path?
> 
> Do you mean that we should add a new callback to struct
> address_space_operations?


If that avoids having to implement truncation completely ourselves, that might 
be one
option we could discuss, yes.

Something like:

diff --git a/Documentation/filesystems/vfs.rst 
b/Documentation/filesystems/vfs.rst
index 7c753148af88..94f8bb81f017 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -764,6 +764,7 @@ cache in your filesystem.  The following members are 
defined:
                sector_t (*bmap)(struct address_space *, sector_t);
                void (*invalidate_folio) (struct folio *, size_t start, size_t 
len);
                bool (*release_folio)(struct folio *, gfp_t);
+               void (*remove_folio)(struct folio *folio);
                void (*free_folio)(struct folio *);
                ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
                int (*migrate_folio)(struct mapping *, struct folio *dst,
@@ -922,6 +923,11 @@ cache in your filesystem.  The following members are 
defined:
        its release_folio will need to ensure this.  Possibly it can
        clear the uptodate flag if it cannot free private data yet.
 
+``remove_folio``
+       remove_folio is called just before the folio is removed from the
+       page cache in order to allow the cleanup of properties (e.g.,
+       accounting) that needs the address_space mapping.
+
 ``free_folio``
        free_folio is called once the folio is no longer visible in the
        page cache in order to allow the cleanup of any private data.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8b3dd145b25e..f7f6930977a1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -422,6 +422,7 @@ struct address_space_operations {
        sector_t (*bmap)(struct address_space *, sector_t);
        void (*invalidate_folio) (struct folio *, size_t offset, size_t len);
        bool (*release_folio)(struct folio *, gfp_t);
+       void (*remove_folio)(struct folio *folio);
        void (*free_folio)(struct folio *folio);
        ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter);
        /*
diff --git a/mm/filemap.c b/mm/filemap.c
index 6cd7974d4ada..5a810eaacab2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -250,8 +250,14 @@ void filemap_free_folio(struct address_space *mapping, 
struct folio *folio)
 void filemap_remove_folio(struct folio *folio)
 {
        struct address_space *mapping = folio->mapping;
+       void (*remove_folio)(struct folio *);
 
        BUG_ON(!folio_test_locked(folio));
+
+       remove_folio = mapping->a_ops->remove_folio;
+       if (unlikely(remove_folio))
+               remove_folio(folio);
+
        spin_lock(&mapping->host->i_lock);
        xa_lock_irq(&mapping->i_pages);
        __filemap_remove_folio(folio, NULL);


Ideally we'd perform it under the lock just after clearing folio->mapping, but 
I guess that
might be more controversial.

For accounting you need the above might be good enough, but I am not sure for 
how many
other use cases there might be.

-- 
Cheers,

David

Re: [RFC PATCH v1 00/10] guest_memfd: Track amount of memory allocated on inode

Reply via email to