On Mon, Jan 19, 2026 at 11:32:42AM -0800, Eric Biggers wrote: > On Mon, Jan 19, 2026 at 07:33:49AM +0100, Christoph Hellwig wrote: > > While looking at fsverity I'd like to understand the choise of offset > > in ext4 and f2fs, and wonder about an issue. > > > > Both ext4 and f2fs round up the inode size to the next 64k boundary > > and place the metadata there. Both use the 65536 magic number for that > > instead of a well documented constant unfortunately. > > > > I assume this was picked to align up to the largest reasonable page > > size? Unfortunately for that: > > > > a) not all architectures are reasonable. As Darrick pointed out > > hexagon seems to support page size up to 1MiB. While I don't know > > if they exist in real life, powerpc supports up to 256kiB pages, > > and I know they are used for real in various embedded settings
They *did* way back in the day, I worked with some seekrit PPC440s early in my career. I don't know that any of them still exist, but the code is still there... > > b) with large folio support in the page cache, the folios used to > > map files can be much larger than the base page size, with all > > the same issues as a larger page size > > > > So assuming that fsverity is trying to avoid the issue of a page/folio > > that covers both data and fsverity metadata, how does it copy with that? > > Do we need to disable fsverity on > 64k page size and disable large > > folios on fsverity files? The latter would mean writing back all cached > > data first as well. > > > > And going forward, should we have a v2 format that fixes this? For that > > we'd still need a maximum folio size of course. And of course I'd like > > to get all these things right from the start in XFS, while still being as > > similar as possible to ext4/f2fs. > > Yes, if I recall correctly it was intended to be the "largest reasonable > page size". It looks like PAGE_SIZE > 65536 can't work as-is, so indeed > we should disable fsverity support in that configuration. > > I don't think large folios are quite as problematic. > ext4_read_merkle_tree_page() and f2fs_read_merkle_tree_page() read a > folio and return the appropriate page in it, and fs/verity/verify.c > operates on the page. If it's a page in the folio that spans EOF, I > think everything will actually still work, except userspace will be able > to see Merkle tree data after a 64K boundary past EOF if the file is > mmapped using huge pages. We don't allow mmapping file data beyond the EOF basepage, even if the underlying folio is a large folio. See generic/749, though recently Kiryl Shutsemau tried to remove that restriction[1], until dchinner and willy told him no. > The mmap issue isn't great, but I'm not sure how much it matters, > especially when the zeroes do still go up to a 64K boundary. I'm concerned that post-eof zeroing of a 256k folio could accidentally obliterate merkle tree content that was somehow previously loaded. Though afaict from the existing codebases, none of them actually make that mistake. > If we do need to fix this, there are a couple things we could consider > doing without changing the on-disk format in ext4 or f2fs: putting the > data in the page cache at a different offset than it exists on-disk, or > using "small" pages for EOF specifically. I'd leave the ondisk offset as-is, but change the pagecache offset to roundup(i_size_read(), mapping_max_folio_size_supported()) just to keep file data and fsverity metadata completely separate. > But yes, XFS should choose a larger alignment than 64K. The roundup() formula above is what I'd choose for the pagecache offset for xfs. The ondisk offset of 1<<53 is ok with me. --D [1] https://lore.kernel.org/linux-fsdevel/20251014175214.GW6188@frogsfrogsfrogs/ _______________________________________________ Linux-f2fs-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
