On Thu, 30 Nov 2023, Kent Overstreet wrote:
> On Thu, Nov 30, 2023 at 08:35:12AM +0100, Donald Buczek wrote:
> > On 11/29/23 22:43, Kent Overstreet wrote:
> > > On Tue, Nov 28, 2023 at 08:49:23AM +0100, Donald Buczek wrote:
> > > > Hi,
> > > > 
> > > > I'm happy, bcachefs finally made it to mainline, Kudos to Kent!
> > > > 
> > > > As far as I know, there is an ongoing discussion about the problems of 
> > > > non-unique inode numbers exposed by snapshots an no real conclusion 
> > > > yet, on how filesystems should expose snapshots.
> > > > 
> > > > The current behavior is kind of a showstopper for us, because we are 
> > > > still running multi-user systems. The problem is, that any unprivileged 
> > > > user can create subvolumes and snapshots:
> > > > 
> > > >      buczek@dose:/scratch/local3$ bcachefs subvolume create vol1
> > > >      buczek@dose:/scratch/local3$ mkdir vol1/dir1
> > > >      buczek@dose:/scratch/local3$ bcachefs subvolume snapshot vol1/snp1
> > > >      buczek@dose:/scratch/local3$ ls -li vol1/
> > > >      total 0
> > > >      1342189197 drwxrwxr-x 2 buczek buczek 0 Nov 20 15:01 dir1
> > > >      1476413180 drwxrwxr-x 3 buczek buczek 0 Nov 20 15:01 snp1
> > > >      buczek@dose:/scratch/local3$ ls -li vol1/snp1/
> > > >      total 0
> > > >      1342189197 drwxrwxr-x 2 buczek buczek 0 Nov 20 15:01 dir1
> > > >      buczek@dose:/scratch/local3$ find .
> > > >      .
> > > >      ./vol1
> > > >      find: File system loop detected; ‘./vol1/snp1’ is part of the same 
> > > > file system loop as ‘./vol1’.
> > > >      ./vol1/dir1
> > > >      buczek@dose:/scratch/local3$
> > > > 
> > > > We have a few tools which walk over the filesystem (backup, mirror, 
> > > > accounting) and these are just not prepared for non-unique inode 
> > > > numbers aside from regular hardlinks. I'm concerned that experiments of 
> > > > a unprivileged users could make important tools to fail.
> > > > 
> > > > Questions:
> > > > 
> > > > - Would it be a workaround to make creation of subvolumes and snapshots 
> > > > privileged operations?
> > > > 
> > > > - If we want to evolve our tools: What is the best way for userspace to 
> > > > recognize subvolumes and snapshots and tell them apart from ordinary 
> > > > directories?
> > > 
> > > I'm considering writing an "integer identifiers considered harmful"
> > > paper. I haven't dug into it yet - just skimmed the lwn coverage - but
> > > adding in the overlayfs issue makes this problem sound fundamentally
> > > unsolvable given the current constraints.
> > > 
> > > Recognizing subvolume boundaries: we don't have anything for this yet,
> > > and given what happened with btrfs I'm not sure we want to go that
> > > route.
> > 
> > A fabricated fsid - I think, that is what btrfs does - wouldn't help
> > us either. The mentioned tools would just stop at the assumed mount
> > points.
> 
> *nod* - nor for NFS.
> 
> > > We could (probably) expose unique inode numbers, filesystem wide, but at
> > > a cost. We'd have to restrict ourselves internally to 32 bit inode
> > > numbers, and then report inode numbers that include the subvolume ID as
> > > the high 32 bits.
> > > 
> > > That won't leave enough bits to shard inode numbers at create time based
> > > on CPU id, which isn't the _most_ important optimization, but would hurt
> > > to lose.
> > > 
> > > Going forward, we really need - as you allude to - better userspace
> > > APIs. Inode number probably can't be an integer anymore, it needs to be
> > > a string if it's going to be able to do all the things we want it to do.
> > 
> > Should be an uuid then. Expose inode uuid (and/or snapshot uuid) via
> > xattr? Anyway, I know, people more competent than me are thinking
> > about the problem.
> 
> Exposing the subvolume ID via an xattr would be easy, if that works for
> you.

I think that would be a bad choice.  Stick with the filehandle.  It is
an existing interface supported by many filesystems and it works over
NFS.

NeilBrown


> 
> > What to you think about the suggestion to make creation of snapshots
> > (and possible volumes, not sure about that) a privileged operation
> > (optional, of course) ? I know, multi-user is niche nowadays, but for
> > me it looks like a rather cheap workaround.
> 
> Yeah, that's not an unreasonable ask.
> 


Reply via email to