On Fri, 08 Dec 2023, Kent Overstreet wrote: > On Fri, Dec 08, 2023 at 12:16:53PM +1100, NeilBrown wrote: > > On Thu, 07 Dec 2023, Donald Buczek wrote: > > > > > > I'd vote for the subvol-id. I think, with bcachefs a u32 would do, > > > but it could be return as u64 for future growth or other filesystems. > > > > Probably a good choice. Though I'd like to re-emphasise that even doing > > this doesn't remove the need to make sure inode numbers are as unique as > > they can possibly be made. There are multiple places in /proc (for > > example) where a file is identified by fsid and inode number. It isn't > > really practical to insert a volume-id in there (though maybe we could > > duplicate all those interfaces....). > > > > Inode numbers would need to be absolutely unique with a volume, and > > mostly unique within a filesystem. xor-ing the internal inode number > > with a hash of the volume number is one way to achieve this. > > > > i.e.: while enabling processes that use stat() family function is > > clearly import, it does not solve all the problems caused by the > > addition of a volume/subvol level between "filesystem" and "inode". > > So, I'm hoping we can start pushing the NFS file handle as a standard > interface for unique file ID, but since we clearly need something > interim/for existing programs: > > xoring in the subvolume ID sounds like it shouldn't have any real > downside, do we want to make that the default?
The only downside is that it does not provide 100% uniqueness so it could give people a false sense of security. You would need to be careful that the xor doesn't result in any reserved inode numbers - 0 in particular. That isn't hard though. If you are going to do it, then don't even make it an option - always do it. > > Another possibility: we do have the ability to restrict inode numbers to > 32 bits, I've been contemplating using this to report the subvol ID in > the top 32 bits. Not my preferred approach, since we currently use the > top bits of the inode number for sharding by cpu ID and this would be > incompatible with that option, but sounds like it might be worth adding. I think this is far-and-away the best option - for bcachefs. All the problems disappear. Can you really not use other bits to shard by cpu ID? For btrfs, it would probably be better to stick with 64bit inode numbers so that bcachefs and btrfs have similar problems and can present a joint front in trying to solve them. (The only reason btrfs cannot use just 32bits for inode numbers is that they never re-use inode numbers. !?!?!?! ) > > For plumbing in an option for how to include the subvolume ID in the > inode number - we'll add it to fs/bcachefs/opts.h as an OPT_STR() option > (i.e. an enum), reserve a few bits in the superblock for it, specify in > opts.h when it can be set (at mount, runtime, format time). > > statx sounds like a no brainer. I don't have the bandwidth to get into > that right now, but: inode_inum().subvol is the bcachefs interface for > getting it. (Not bi_subvolume in the on disk inode, that's only set on > subvolume roots). > > How does that sound on the bcachefs side, Neil? > I'm against making things optional, and don't think sharding (or not reusing inode numbers) is a good excuse to cause Posix incompatibility. But other than that, it makes sense. Thanks, NeilBrown
