On Sat, Sep 17, 2016 at 06:37:16AM +0000, Alex Elsayed wrote: > > Encryption in ext4 is a per-directory-tree affair. One starts by > > setting an encryption policy (using an ioctl() call) for a given > > directory, which must be empty at the time; that policy includes a > > master key used for all files and directories stored below the target > > directory. Each individual file is encrypted with its own key, which is > > derived from the master key and a per-file random nonce value (which is > > stored in an extended attribute attached to the file's inode). File > > names and symbolic links are also encrypted.
Probably the simplest way to map this to btrfs is to move the nonce from the inode to the extent. Inodes aren't unique within a btrfs filesystem, extents can be shared by multiple inodes, and a single extent can appear multiple times in the same inode at different offsets. Attaching the nonce to the inode would not be sufficient to read the extent in all but the special case of a single reference at the original offset where it was written, and it also leads to the replay problems with duplicate inodes you pointed out. Extents in a btrfs filesystem are unique and carry their own attributes (e.g. compression format, checksums) and reference count. They can easily carry a reference to an encryption policy object and a nonce attribute. Nonces within metadata are more complicated. btrfs doesn't have directory files like ext4 does, so it doesn't get directory filename encryption for free with file encryption. Encryption could be done per-item in the metadata trees, but in the special case of directories that happen to the the roots of subvols, it would be possible to encrypt entire pages of metadata at a time (with the caveat that a snapshot would require shared encryption policy between the origin and snapshot subvols). This is what makes keys at the subvol root level so attractive. > So there isn't quite a "subvol key" in the VFS approach - each directory > has a key, and there are derived keys for the entries below it. (I'll > note that this framing does not address shared extents _at all_, and > would love to have clarification on that). Files are modified by creating new extents (using parameters inherited from the inode to fill in the extent attributes) and updating the inode to refer to the new extent instead of the old one at the modified offset. Cloned extents are references to existing extents associated with a different inode or at a different place within the same inode (if the extent is not compatible with the destination inode, clone fails with an error). A snapshot is an efficient way to clone an entire subvol tree at once, including all inodes and attributes. Inode attributes and extent attributes can sometimes conflict, especially during a clone operation. Encryption attributes could become one of these cases (i.e. to prevent an extent from one encryption policy from being cloned to an inode under a different encryption policy). > > I don't see how snapshots could work, writable or otherwise, without > > separating the key identity from the subvol identity and having a > > many-to-one relationship between subvols and keys. The extents in each > > subvol would be shared, and they'd be encrypted with a single secret, > > so there's not really another way to do this. > > That's not the issue. The issue is that, assuming the key stays the same, > then a user could quite possibly create a snapshot, write into both the > original and the snapshot, causing encryption to occur twice with the > same key, same nonce, and different data. If the extents have nonces (and inodes do not) then this doesn't happen. A write to either snapshot necessarily creates new extents in all cases (the nodatacow feature, the only way to modify a data extent in-place, is disabled when the extent is shared).
Description: Digital signature