On 12/13/23 13:28, Kent Overstreet wrote:
> On Wed, Dec 13, 2023 at 08:37:57AM +0100, Donald Buczek wrote:
>> Probably not for the specific applications I mentioned (backup, mirror,
>> accounting). These are intended to run continuously, slowly and unnoticed
>> in the background, so they are memory and i/o throttled via cgroups anyway
>> and one is even using sleep after so-and-so many stat calls to reduce
>> its impact.
>>
>> If they could tell a directory from a snapshot, I would probably stop them
>> from walking into snapshots. And if not, the snapshot id is all that is
>> needed to tell a clone in a snapshot from a hardlink. So these don't really
>> need the filehandle.
>
> Perhaps we should allocate a bit for differentiating a snapshot from a
> non snapshot subvolume?
Are there non-snapshots subvolumes?
>From debugfs bcachefs/../btrees, I've got the impression, that every
volume starts with a (single) snapshot.
new fileystem:
subvolumes
==========
u64s 10 type subvolume 0:1:0 len 0 ver 0: root 4096 snapshot id 4294967295
parent 0
snapshots
=========
u64s 10 type snapshot 0:4294967295:0 len 0 ver 0: is_subvol 1 deleted 0 parent
0 children 0 0 subvol 1 tree 1 depth 0 skiplist 0 0 0
`bcachefs subvolume create /mnt/v`
subvolumes
==========
u64s 10 type subvolume 0:1:0 len 0 ver 0: root 4096 snapshot id 4294967295
parent 0
u64s 10 type subvolume 0:2:0 len 0 ver 0: root 1207959552 snapshot id
4294967294 parent 0
snapshots
=========
u64s 10 type snapshot 0:4294967294:0 len 0 ver 0: is_subvol 1 deleted 0 parent
0 children 0 0 subvol 2 tree 2 depth 0 skiplist 0 0 0
u64s 10 type snapshot 0:4294967295:0 len 0 ver 0: is_subvol 1 deleted 0 parent
0 children 0 0 subvol 1 tree 1 depth 0 skiplist 0 0 0
`bcachefs subvolume snapshot /mnt/v /mnt/s`
subvolumes
==========
u64s 10 type subvolume 0:1:0 len 0 ver 0: root 4096 snapshot id 4294967295
parent 0
u64s 10 type subvolume 0:2:0 len 0 ver 0: root 1207959552 snapshot id
4294967292 parent 0
u64s 10 type subvolume 0:3:0 len 0 ver 0: root 1207959552 snapshot id
4294967293 parent 2
snapshot
========
u64s 10 type snapshot 0:4294967292:0 len 0 ver 0: is_subvol 1 deleted 0 parent
4294967294 children 0 0 subvol 2 tree 2 depth 1 skiplist
4294967294 4294967294 4294967294
u64s 10 type snapshot 0:4294967293:0 len 0 ver 0: is_subvol 1 deleted 0 parent
4294967294 children 0 0 subvol 3 tree 2 depth 1 skiplist
4294967294 4294967294 4294967294
u64s 10 type snapshot 0:4294967294:0 len 0 ver 0: is_subvol 0 deleted 0 parent
0 children 4294967293 4294967292 subvol 0 tree 2 depth 0 skiplist 0 0 0
u64s 10 type snapshot 0:4294967295:0 len 0 ver 0: is_subvol 1 deleted 0 parent
0 children 0 0 subvol 1 tree 1 depth 0 skiplist 0 0 0
Now reading and interpreting the filehandles:
/mnt/. type 177 : 00 10 00 00 00 00 00 00 01 00 00 00 00 00 00 00 : inode
0000000000001000 subvolume 00000001 generation 00000000
/mnt/v type 177 : 00 00 00 48 00 00 00 00 02 00 00 00 00 00 00 00 : inode
0000000048000000 subvolume 00000002 generation 00000000
/mnt/s type 177 : 00 00 00 48 00 00 00 00 03 00 00 00 00 00 00 00 : inode
0000000048000000 subvolume 00000003 generation 00000000
So is there really a type difference between the objects created by
`bcachefs subvolume create` and `bcachefs subvolume snapshot` ? It appears
that they both point to a volume which points to a snapshot in the snapshot
tree.
Best
Donald
>> In the thread it was assumed, that there are other (unspecified)
>> applications which need the filehandle and currently use name_to_handle_at().
>>
>> I though it was self-evident that a single syscall to retrieve all
>> information atomically is better than a set of syscalls. Each additional
>> syscall has overhead and you need to be concerned with the data changing
>> between the calls.
>
> All other things being equal, yeah it would be. But things are never
> equal :)
>
> Expanding struct statx is not going to be as easy as hoped, so we need
> to be a bit careful how we use the remaining space, and since as Dave
> pointed out the filehandle isn't needed for checking uniqueness unless
> nlink > 1 it's not really a hotpath in any application I can think of.
>
> (If anyone does know of an application where it might matter, now's the
> time to bring it up!)
>
>> Userspace nfs server as an example of an application, where visible
>> performance is more relevant, was already mentioned by someone else.
>
> I'd love to hear confirmation from someone more intimately familiar with
> NFS, but AFAIK it shouldn't matter there; the filehandle exists to
> support resuming IO or other operations to a file (because the server
> can go away and come back). If all the client did was a stat, there's no
> need for a filehandle - that's not needed until a file is opened.
--
Donald Buczek
[email protected]
Tel: +49 30 8413 1433