On Fri, Aug 28, 2020 at 04:20:10PM +0300, Nikolay Borisov wrote:
> When a snapshot is created its root item is inserted in the root tree
> with the 'offset' field set to the transaction id when the snapshot
> was created. Immediately afterwards the offset is set to -1 so when
> the same key is used to create DIR_ITEM/DIR_INDEX in the destination
> file tree its offset is really set to -1. Root tree item:
> 
>     item 13 key (258 ROOT_ITEM 7) itemoff 12744 itemsize 439
>         generation 7 root_dirid 256 bytenr 30703616 level 0 refs 1
>         lastsnap 7 byte_limit 0 bytes_used 16384 flags 0x0(none)
>         uuid f13abf0d-b1f5-f34b-a179-fd1c2f89e762
>         parent_uuid 51a74677-a077-4c21-bd87-2141a147ff85
>         ctransid 7 otransid 7 stransid 0 rtransid 0
>         ctime 1598441149.466822752 (2020-08-26 11:25:49)
>         otime 1598441149.467474846 (2020-08-26 11:25:49)
>         drop key (0 UNKNOWN.0 0) level 0
> 
> DIR_INDEX item for the same rooti in the destination fs tree:
> 
> item 5 key (256 DIR_INDEX 9) itemoff 15967 itemsize 39
>         location key (258 ROOT_ITEM 18446744073709551615) type DIR
>         transid 7 data_len 0 name_len 9
>         name: snapshot1
> 
> The location key is generally used to read the root. This is not a
> problem per-se since the function dealing with root searching
> (btrfs_find_root) is well equipped to deal with offset being -1, namely:
> 
>     If ->offset of 'search_key' is -1ULL, it means we are not sure the
>     offset of the search key, just lookup the root with the highest
>     offset for a given objectid.
> 
> However this is a needless inconcistency in the way internal data
> structures are being created. This patch modifies the behavior so that
> DIR_INDEX/DIR_ITEM will have the offset field of the location key set
> to the transid. While this results in a change of the on-disk metadata,
> it doesn't constitute a functional change since older kernels can cope
> with both '-1' and transid as values of the offset field so no INCOMPAT
> flags are needed.
> 
> Finally while at it also move the initialization of the key in
> create_pending_snapshot closer to where it's being used for the first
> time.

For the record I'll reply here too.

This is technically correct and probably wouldn't cause any noticeable
problem. The main concern here is that it's changing the item data from
a format that's been out there forever. Once this patch would land, the
dir items will have a different location key, making it inconsistent.

That the root item key itself has the transaction id is part of the
format, that dir item location keys have the -1 in the offset is as
well.  That the two are different and not the same is maybe unfortunate
but certainly not harmful.

Changing the format even a little always brings a risk and we never know
in advance. We only learn from mistakes here and did a few in the past.
My gut feeling is to keep it as-is, or do it properly with incompat
bits if needed. In this case it's not necessary so we'll keep things
unchanged.

> Signed-off-by: Nikolay Borisov <nbori...@suse.com>
> ---
>  fs/btrfs/transaction.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 7243532cd34b..83f4e7056488 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1509,9 +1509,6 @@ static noinline int create_pending_snapshot(struct 
> btrfs_trans_handle *trans,
>                       goto clear_skip_qgroup;
>       }
>  
> -     key.objectid = objectid;
> -     key.offset = (u64)-1;
> -     key.type = BTRFS_ROOT_ITEM_KEY;
>  
>       rsv = trans->block_rsv;
>       trans->block_rsv = &pending->block_rsv;
> @@ -1613,7 +1610,9 @@ static noinline int create_pending_snapshot(struct 
> btrfs_trans_handle *trans,
>  
>       btrfs_set_root_node(new_root_item, tmp);
>       /* record when the snapshot was created in key.offset */
> +     key.objectid = objectid;
>       key.offset = trans->transid;
> +     key.type = BTRFS_ROOT_ITEM_KEY;

This can be sent as a cleanup, it's better to keep the key intialization
at one place.

>       ret = btrfs_insert_root(trans, tree_root, &key, new_root_item);
>       btrfs_tree_unlock(tmp);
>       free_extent_buffer(tmp);
> @@ -1634,7 +1633,6 @@ static noinline int create_pending_snapshot(struct 
> btrfs_trans_handle *trans,
>               goto fail;
>       }
>  
> -     key.offset = (u64)-1;

And a comment explaining the difference between the root item and dir
item keys.

>       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, 
> pending->anon_dev);
>       if (IS_ERR(pending->snap)) {
>               ret = PTR_ERR(pending->snap);
> -- 
> 2.17.1

Reply via email to