On 2026/5/8 16:51, Christoph Hellwig wrote:
On Fri, May 08, 2026 at 04:39:15PM +0800, Gao Xiang wrote:
Currently EROFS file-backed mount metadata is directly using underlay
fs page cache, which is mainly used for composefs, etc. to avoid
different EROFS instances have their own EROFS page cache for the
same underlay backing file and avoid unnecessary copies into them.
--- That is also what composefs once did in their codebase.

Since EROFS just read the underlayfs page cache and does _not_
touch anything inside the underlay page cache itself, so I guess
it's fine?

At the micro-level this does mean erofs needs to do the checks itself.
OTOH it means this whole scheme is completely broken.  The page cache
is owned by the file system, so erofs can't simply poke into it.

The page cache is indeed owned by the underlay file system
instead, but erofs doesn't poke into it: it just needs some
temporary metadata read usage without extra allocated buffers.

On the one side, I hope if there could be some interface for
such temporary usage rather than just one vfs_iter_read model.


Now for reads it mostly works on the most common disk-based file systems,
but it does create lots of problem for slightly more complex ones like
network/clustered or synthetic file systems.  It also really breaks

Just out of curiousity, could you point out one specific path
so I can look into that.

layering, so we need to fix it.  Not sure what would be best, but I'd be
tempted to have a cross-instance cache maintained by erofs and filled
using in-kernel direct I/O.  IFF the page policies work great for you

Direct I/O may be improper for many cases, since users will use
buffer I/Os to download the images from remotes just now, and
direct I/O just makes it worse (invalidate the cache, and reread
from disk) and double caching if underlay file is also read.

that even could be a synthetic inode/mapping.

I expect the similar comments, if we really need to work out such
cross-instance cache, I'm fine to implement for Linux 7.2.  It will
increase the complexity of the codebase and also it won't share the
cache with the underlay fs.

But could we just fix this issue first for previous linux versions?


On the other hand, we talked a bit commit f2fed441c69b ("loop:
stop using vfs_iter_{read,write} for buffered I/O") in another
private thread related to fanotify, which lacks proper
rw_verify_area() as well, since it called into raw read/write
iter methods instead of using the previous vfs_iter_{read,write}.

Note that this does not add the bypass, just extends it to both I/O
types.  But yes, this breaks fanotify.  We actually have quite a few
raw ->read_iter/->write_iter calls, so this might need more structured
treatment.

It also bypasses the security hooks I think.

Thanks,
Gao Xiang



Reply via email to