On 12/8/23 03:49, Kent Overstreet wrote: > We really only need 6 or 7 bits out of the inode number for sharding; > then 20-32 bits (nobody's going to have a billion snapshots; a million > is a more reasonable upper bound) for the subvolume ID leaves 30 to 40 > bits for actually allocating inodes out of. > > That'll be enough for the vast, vast majority of users, but exceeding > that limit is already something we're technically capable of: we're > currently seeing filesystems well over 100 TB, petabyte range expected > as fsck gets more optimized and online fsck comes.
30 bits would not be enough even today: buczek@done:~$ df -i /amd/done/C/C8024 Filesystem Inodes IUsed IFree IUse% Mounted on /dev/md0 2187890304 618857441 1569032863 29% /amd/done/C/C8024 So that's 32 bit on a random production system ( 618857441 == 0x24e303e1 ). And if the idea to produce unique inode numbers by hashing the filehandle into 64 is followed, collisions definitely need to be addressed. With 618857441 objects, the probability of a hash collision with 64 bit is already over 1% [1]. [1] https://en.wikipedia.org/wiki/Birthday_problem from math import exp n = 618857441 d = 2**64 1-exp(-(n**2)/(2*d)) 0.010327121831036457 D. > So I can't bake in a limit like that, we need to keep our options open > :) > >> For btrfs, it would probably be better to stick with 64bit inode numbers >> so that bcachefs and btrfs have similar problems and can present a joint >> front in trying to solve them. >> >> (The only reason btrfs cannot use just 32bits for inode numbers is that >> they never re-use inode numbers. !?!?!?! ) > > That is a !?!?!?. bcachefs certainly can, and it's not hard, we leave > around an inode_generation key when we delete an inode. > >> I'm against making things optional, and don't think sharding (or not >> reusing inode numbers) is a good excuse to cause Posix incompatibility. >> But other than that, it makes sense. > > I just don't think posix compatibility is realistically possible in all > situations we've got coming - especially with overlayfs to consider. > People are also starting to want to do fun things with container > filesystems, that's becoming a hot topic as well and depending on how > it's done the problem of combining inode numbers from multiple > filesystems into one coherent "view" may come up again. > > So I think we also need to be designing something that we know is going > to work and relaxing our constraints a bit. -- Donald Buczek [email protected] Tel: +49 30 8413 1433
