On Thu, Feb 04, 2021 at 10:58:19PM -0800, Boris Burkov wrote: > On Thu, Feb 04, 2021 at 10:13:54PM -0800, Eric Biggers wrote: > > On Thu, Feb 04, 2021 at 03:21:36PM -0800, Boris Burkov wrote: > > > This patchset provides support for fsverity in btrfs. > > > > Very interested to see this! It generally looks good, but I have some > > comments. > > > > Also, when you send this out next, can you include > > linux-fscr...@vger.kernel.org, as per 'get_maintainer.pl fs/verity/'? > > > > Sorry for missing that, definitely will do for v2. > > > > At a high level, we store the verity descriptor and Merkle tree data > > > in the file system btree with the file's inode as the objectid, and > > > direct reads/writes to those items to implement the generic fsverity > > > interface required by fs/verity/. > > > > > > The first patch is a preparatory patch which adds a notion of > > > compat_flags to the btrfs_inode and inode_item in order to allow > > > enabling verity on a file without making the file system unmountable for > > > older kernels. (It runs afoul of the leaf corruption check otherwise)
> > In ext4, verity is a ro_compat filesystem feature rather than a compat > > feature. > > That's because we wanted to prevent old kernels from writing to verity > > files, > > which would corrupt them (get them out of sync with their Merkle trees). > > > > Are you sure you want to make this a "compat" flag? > > > > I wasn't sure, so I'm glad you brought it up. That's a pretty compelling > argument for making it ro_comnpat, in my opinion. I was also worried > about the old kernel deleting the file and leaking the Merkle items. > > On the other hand, it feels a shame to make the whole file system read > only over "just one file". Read only over "just one file" isn't new in btrfs. More shameful things have already been implemented. ;) Any random user can run 'setfattr -n btrfs.compression -v zstd .' and flip the COMPRESS_ZSTD incompat bit on the filesystem. After that, the filesystem can't be mounted on kernel 4.13 or earlier ever again, not even ro. Same thing with LZO on earlier kernels. [1] > Do you have any good strategies for getting back a file system after > creating some verity files but then running a kernel without verity? There are a few btrfs incompat bits that can be turned off, but they require expunging any incompat metadata from the filesystem, so they are only possible as offline (like btrfs check or btrfstune) or mount-time changes (like -o space_cache=v2), or as part of operations that can iterate over the whole filesystem while online (like balance with convert to remove new RAID profiles). Generally anything that operates on scales smaller than a block group (like inodes or extents) can't be turned off. > I could write some utilities to list/delete verity files before doing > that transition? > > > > > > > The second patch is the bulk of the fsverity implementation. It > > > implements the fsverity interface and adds verity checks for the typical > > > file reading case. > > > > > > The third patch cleans up the corner cases in readpage, covering inline > > > extents, preallocated extents, and holes. > > > > > > The fourth patch handles direct io of a veritied file by falling back to > > > buffered io. > > > > > > The fifth patch adds a feature file in sysfs for verity. > > > > I'm also wondering if you've tested using this in combination with btrfs > > compression. f2fs also supports compression and verity in combination, and > > there have been some problems caused by that combination not being properly > > tested. It should just work though. > > > > I hadn't tested it with compression yet, but I'll definitely do so, > especially since it was a pain point before. Thanks for the tip. > > > - Eric [1] It could have been possible to avoid using incompat bits for compression algorithms: return ENOTSUPP on reads of data with unknown algorithms, fall back to uncompressed on writes, use the raw encoded data in send/receive, and even dedupe sometimes, if the encoded data and algorithm ID matches. Balance and scrub already use only the encoded form and don't need to decompress to move the data around or verify it against csums. If the filesystem is mounted with a new kernel again, then everything the old kernel did to the filesystem would be OK: overwrites and deletes would work, old and new data would all be readable. Users might be alarmed by getting unexpected read errors on old kernels (hence the incompat bit) but technically the filesystem could have been mountable. This is an unusual "wo_compat" bit's use case--writing is possible with an old kernel, reading is not. Verity doesn't fit this model. There's no way to fall back to naively not updating Merkle trees that is distinguishable from corrupting the data.