On Fri, May 08, 2026 at 04:39:15PM +0800, Gao Xiang wrote:
> Currently EROFS file-backed mount metadata is directly using underlay
> fs page cache, which is mainly used for composefs, etc. to avoid
> different EROFS instances have their own EROFS page cache for the
> same underlay backing file and avoid unnecessary copies into them.
> --- That is also what composefs once did in their codebase.
>
> Since EROFS just read the underlayfs page cache and does _not_
> touch anything inside the underlay page cache itself, so I guess
> it's fine?
At the micro-level this does mean erofs needs to do the checks itself.
OTOH it means this whole scheme is completely broken. The page cache
is owned by the file system, so erofs can't simply poke into it.
Now for reads it mostly works on the most common disk-based file systems,
but it does create lots of problem for slightly more complex ones like
network/clustered or synthetic file systems. It also really breaks
layering, so we need to fix it. Not sure what would be best, but I'd be
tempted to have a cross-instance cache maintained by erofs and filled
using in-kernel direct I/O. IFF the page policies work great for you
that even could be a synthetic inode/mapping.
> On the other hand, we talked a bit commit f2fed441c69b ("loop:
> stop using vfs_iter_{read,write} for buffered I/O") in another
> private thread related to fanotify, which lacks proper
> rw_verify_area() as well, since it called into raw read/write
> iter methods instead of using the previous vfs_iter_{read,write}.
Note that this does not add the bypass, just extends it to both I/O
types. But yes, this breaks fanotify. We actually have quite a few
raw ->read_iter/->write_iter calls, so this might need more structured
treatment.