On Fri, Mar 02, 2018 at 11:29:33AM -0700, Liu Bo wrote:
> On Wed, Feb 28, 2018 at 03:56:10PM +0000, fdman...@kernel.org wrote:
> > From: Filipe Manana <fdman...@suse.com>
> > 
> > If we have a file with 2 (or more) hard links in the same directory,
> > remove one of the hard links, create a new file (or link an existing file)
> > in the same directory with the name of the removed hard link, and then
> > finally fsync the new file, we end up with a log that fails to replay,
> > causing a mount failure.
> > 
> > Example:
> > 
> >   $ mkfs.btrfs -f /dev/sdb
> >   $ mount /dev/sdb /mnt
> > 
> >   $ mkdir /mnt/testdir
> >   $ touch /mnt/testdir/foo
> >   $ ln /mnt/testdir/foo /mnt/testdir/bar
> > 
> >   $ sync
> > 
> >   $ unlink /mnt/testdir/bar
> >   $ touch /mnt/testdir/bar

So here, "unlink & touch" similar to rename, therefore we could also
be conservative and set /mnt/testdir 's last_unlink_trans to force to
commit transaction.

Thanks,

-liubo

> >   $ xfs_io -c "fsync" /mnt/testdir/bar
> > 
> >   <power failure>
> > 
> >   $ mount /dev/sdb /mnt
> >   mount: mount(2) failed: /mnt: No such file or directory
> > 
> > When replaying the log, for that example, we also see the following in
> > dmesg/syslog:
> > 
> >   [71813.671307] BTRFS info (device dm-0): failed to delete reference to 
> > bar, inode 258 parent 257
> >   [71813.674204] ------------[ cut here ]------------
> >   [71813.675694] BTRFS: Transaction aborted (error -2)
> >   [71813.677236] WARNING: CPU: 1 PID: 13231 at fs/btrfs/inode.c:4128 
> > __btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] Modules linked in: btrfs xfs f2fs dm_flakey dm_mod dax 
> > ghash_clmulni_intel ppdev pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
> > glue_helper evdev psmouse i2c_piix4 parport_pc i2c_core pcspkr sg serio_raw 
> > parport button sunrpc loop autofs4 ext4 crc16 mbcache jbd2 zstd_decompress 
> > zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq 
> > async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 
> > multipath linear md_mod ata_generic sd_mod virtio_scsi ata_piix libata 
> > virtio_pci virtio_ring crc32c_intel floppy virtio e1000 scsi_mod [last 
> > unloaded: btrfs]
> >   [71813.679669] CPU: 1 PID: 13231 Comm: mount Tainted: G        W        
> > 4.15.0-rc9-btrfs-next-56+ #1
> >   [71813.679669] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> >   [71813.679669] RIP: 0010:__btrfs_unlink_inode+0x17b/0x355 [btrfs]
> >   [71813.679669] RSP: 0018:ffffc90001cef738 EFLAGS: 00010286
> >   [71813.679669] RAX: 0000000000000025 RBX: ffff880217ce4708 RCX: 
> > 0000000000000001
> >   [71813.679669] RDX: 0000000000000000 RSI: ffffffff81c14bae RDI: 
> > 00000000ffffffff
> >   [71813.679669] RBP: ffffc90001cef7c0 R08: 0000000000000001 R09: 
> > 0000000000000001
> >   [71813.679669] R10: ffffc90001cef5e0 R11: ffffffff8343f007 R12: 
> > ffff880217d474c8
> >   [71813.679669] R13: 00000000fffffffe R14: ffff88021ccf1548 R15: 
> > 0000000000000101
> >   [71813.679669] FS:  00007f7cee84c480(0000) GS:ffff88023fc80000(0000) 
> > knlGS:0000000000000000
> >   [71813.679669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >   [71813.679669] CR2: 00007f7cedc1abf9 CR3: 00000002354b4003 CR4: 
> > 00000000001606e0
> >   [71813.679669] Call Trace:
> >   [71813.679669]  btrfs_unlink_inode+0x17/0x41 [btrfs]
> >   [71813.679669]  drop_one_dir_item+0xfa/0x131 [btrfs]
> >   [71813.679669]  add_inode_ref+0x71e/0x851 [btrfs]
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  ? replay_one_buffer+0x53/0x53a [btrfs]
> >   [71813.679669]  replay_one_buffer+0x4a4/0x53a [btrfs]
> >   [71813.679669]  ? rcu_read_unlock+0x3a/0x57
> >   [71813.679669]  ? __lock_is_held+0x39/0x71
> >   [71813.679669]  walk_up_log_tree+0x101/0x1d2 [btrfs]
> >   [71813.679669]  walk_log_tree+0xad/0x188 [btrfs]
> >   [71813.679669]  btrfs_recover_log_trees+0x1fa/0x31e [btrfs]
> >   [71813.679669]  ? replay_one_extent+0x544/0x544 [btrfs]
> >   [71813.679669]  open_ctree+0x1cf6/0x2209 [btrfs]
> >   [71813.679669]  btrfs_mount_root+0x368/0x482 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  btrfs_mount+0x13e/0x772 [btrfs]
> >   [71813.679669]  ? trace_hardirqs_on_caller+0x14c/0x1a6
> >   [71813.679669]  ? __lockdep_init_map+0x176/0x1c2
> >   [71813.679669]  ? mount_fs+0x64/0x10b
> >   [71813.679669]  mount_fs+0x64/0x10b
> >   [71813.679669]  vfs_kern_mount+0x68/0xce
> >   [71813.679669]  do_mount+0x6e5/0x973
> >   [71813.679669]  ? memdup_user+0x3e/0x5c
> >   [71813.679669]  SyS_mount+0x72/0x98
> >   [71813.679669]  entry_SYSCALL_64_fastpath+0x1e/0x8b
> >   [71813.679669] RIP: 0033:0x7f7cedf150ba
> >   [71813.679669] RSP: 002b:00007ffca71da688 EFLAGS: 00000206
> >   [71813.679669] Code: 7f a0 e8 51 0c fd ff 48 8b 43 50 f0 0f ba a8 30 2c 
> > 00 00 02 72 17 41 83 fd fb 74 11 44 89 ee 48 c7 c7 7d 11 7f a0 e8 38 f5 8d 
> > e0 <0f> ff 44 89 e9 ba 20 10 00 00 eb 4d 48 8b 4d b0 48 8b 75 88 4c
> >   [71813.679669] ---[ end trace 83bd473fc5b4663b ]---
> >   [71813.854764] BTRFS: error (device dm-0) in __btrfs_unlink_inode:4128: 
> > errno=-2 No such entry
> >   [71813.886994] BTRFS: error (device dm-0) in btrfs_replay_log:2307: 
> > errno=-2 No such entry (Failed to recover log tree)
> >   [71813.903357] BTRFS error (device dm-0): cleaner transaction attach 
> > returned -30
> >   [71814.128078] BTRFS error (device dm-0): open_ctree failed
> > 
> > This happens because the log has inode reference items for both inode 258
> > (the first file we created) and inode 259 (the second file created), and
> > when processing the reference item for inode 258, we replace the
> > corresponding item in the subvolume tree (which has two names, "foo" and
> > "bar") witht he one in the log (which only has one name, "foo") without
> > removing the corresponding dir index keys from the parent directory.
> > Later, when processing the inode reference item for inode 259, which has
> > a name of "bar" associated to it, we notice that dir index entries exist
> > for that name and for a different inode, so we attempt to unlink that
> > name, which fails because the inode reference item for inode 258 no longer
> > has the name "bar" associated to it, making a call to btrfs_unlink_inode()
> > fail with a -ENOENT error.
> > 
> > Fix this by unlinking all the names in an inode reference item from a
> > subvolume tree that are not present in the inode reference item found in
> > the log tree, before overwriting it with the item from the log tree.
> 
> Since the order during replaying is INODE_ITEM then DIR_INDEX then
> other types, in this particular case, can we fix the problem by also
> logging the parent so that we have the correct DIR_INDEX?
> 
> With DIR_INDEX, the problem would be fixed simpler, wouldn't it?
> 
> Thanks,
> 
> -liubo
> > 
> > Signed-off-by: Filipe Manana <fdman...@suse.com>
> > ---
> >  fs/btrfs/ctree.h      |   5 ++-
> >  fs/btrfs/inode-item.c |  44 ++++++++++++--------
> >  fs/btrfs/tree-log.c   | 112 
> > ++++++++++++++++++++++++++++++++++++++++++++++++--
> >  3 files changed, 139 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 1a462ab85c49..5d33478bc704 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -3095,7 +3095,10 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle 
> > *trans,
> >                       u64 inode_objectid, u64 ref_objectid, int ins_len,
> >                       int cow);
> >  
> > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path,
> > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot,
> > +                          const char *name,
> > +                          int name_len, struct btrfs_inode_ref **ref_ret);
> > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot,
> >                                u64 ref_objectid, const char *name,
> >                                int name_len,
> >                                struct btrfs_inode_extref **extref_ret);
> > diff --git a/fs/btrfs/inode-item.c b/fs/btrfs/inode-item.c
> > index 39c968f80157..65e1a76bf755 100644
> > --- a/fs/btrfs/inode-item.c
> > +++ b/fs/btrfs/inode-item.c
> > @@ -22,10 +22,10 @@
> >  #include "transaction.h"
> >  #include "print-tree.h"
> >  
> > -static int find_name_in_backref(struct btrfs_path *path, const char *name,
> > -                    int name_len, struct btrfs_inode_ref **ref_ret)
> > +int btrfs_find_name_in_backref(struct extent_buffer *leaf, int slot,
> > +                          const char *name,
> > +                          int name_len, struct btrfs_inode_ref **ref_ret)
> >  {
> > -   struct extent_buffer *leaf;
> >     struct btrfs_inode_ref *ref;
> >     unsigned long ptr;
> >     unsigned long name_ptr;
> > @@ -33,9 +33,8 @@ static int find_name_in_backref(struct btrfs_path *path, 
> > const char *name,
> >     u32 cur_offset = 0;
> >     int len;
> >  
> > -   leaf = path->nodes[0];
> > -   item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > -   ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +   item_size = btrfs_item_size_nr(leaf, slot);
> > +   ptr = btrfs_item_ptr_offset(leaf, slot);
> >     while (cur_offset < item_size) {
> >             ref = (struct btrfs_inode_ref *)(ptr + cur_offset);
> >             len = btrfs_inode_ref_name_len(leaf, ref);
> > @@ -44,18 +43,19 @@ static int find_name_in_backref(struct btrfs_path 
> > *path, const char *name,
> >             if (len != name_len)
> >                     continue;
> >             if (memcmp_extent_buffer(leaf, name, name_ptr, name_len) == 0) {
> > -                   *ref_ret = ref;
> > +                   if (ref_ret)
> > +                           *ref_ret = ref;
> >                     return 1;
> >             }
> >     }
> >     return 0;
> >  }
> >  
> > -int btrfs_find_name_in_ext_backref(struct btrfs_path *path, u64 
> > ref_objectid,
> > +int btrfs_find_name_in_ext_backref(struct extent_buffer *leaf, int slot,
> > +                              u64 ref_objectid,
> >                                const char *name, int name_len,
> >                                struct btrfs_inode_extref **extref_ret)
> >  {
> > -   struct extent_buffer *leaf;
> >     struct btrfs_inode_extref *extref;
> >     unsigned long ptr;
> >     unsigned long name_ptr;
> > @@ -63,9 +63,8 @@ int btrfs_find_name_in_ext_backref(struct btrfs_path 
> > *path, u64 ref_objectid,
> >     u32 cur_offset = 0;
> >     int ref_name_len;
> >  
> > -   leaf = path->nodes[0];
> > -   item_size = btrfs_item_size_nr(leaf, path->slots[0]);
> > -   ptr = btrfs_item_ptr_offset(leaf, path->slots[0]);
> > +   item_size = btrfs_item_size_nr(leaf, slot);
> > +   ptr = btrfs_item_ptr_offset(leaf, slot);
> >  
> >     /*
> >      * Search all extended backrefs in this item. We're only
> > @@ -113,7 +112,9 @@ btrfs_lookup_inode_extref(struct btrfs_trans_handle 
> > *trans,
> >             return ERR_PTR(ret);
> >     if (ret > 0)
> >             return NULL;
> > -   if (!btrfs_find_name_in_ext_backref(path, ref_objectid, name, name_len, 
> > &extref))
> > +   if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0],
> > +                                       ref_objectid, name, name_len,
> > +                                       &extref))
> >             return NULL;
> >     return extref;
> >  }
> > @@ -155,7 +156,8 @@ static int btrfs_del_inode_extref(struct 
> > btrfs_trans_handle *trans,
> >      * This should always succeed so error here will make the FS
> >      * readonly.
> >      */
> > -   if (!btrfs_find_name_in_ext_backref(path, ref_objectid,
> > +   if (!btrfs_find_name_in_ext_backref(path->nodes[0], path->slots[0],
> > +                                       ref_objectid,
> >                                         name, name_len, &extref)) {
> >             btrfs_handle_fs_error(root->fs_info, -ENOENT, NULL);
> >             ret = -EROFS;
> > @@ -225,7 +227,8 @@ int btrfs_del_inode_ref(struct btrfs_trans_handle 
> > *trans,
> >     } else if (ret < 0) {
> >             goto out;
> >     }
> > -   if (!find_name_in_backref(path, name, name_len, &ref)) {
> > +   if (!btrfs_find_name_in_backref(path->nodes[0], path->slots[0],
> > +                                   name, name_len, &ref)) {
> >             ret = -ENOENT;
> >             search_ext_refs = 1;
> >             goto out;
> > @@ -293,7 +296,9 @@ static int btrfs_insert_inode_extref(struct 
> > btrfs_trans_handle *trans,
> >     ret = btrfs_insert_empty_item(trans, root, path, &key,
> >                                   ins_len);
> >     if (ret == -EEXIST) {
> > -           if (btrfs_find_name_in_ext_backref(path, ref_objectid,
> > +           if (btrfs_find_name_in_ext_backref(path->nodes[0],
> > +                                              path->slots[0],
> > +                                              ref_objectid,
> >                                                name, name_len, NULL))
> >                     goto out;
> >  
> > @@ -351,7 +356,8 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle 
> > *trans,
> >     if (ret == -EEXIST) {
> >             u32 old_size;
> >  
> > -           if (find_name_in_backref(path, name, name_len, &ref))
> > +           if (btrfs_find_name_in_backref(path->nodes[0], path->slots[0],
> > +                                          name, name_len, &ref))
> >                     goto out;
> >  
> >             old_size = btrfs_item_size_nr(path->nodes[0], path->slots[0]);
> > @@ -365,7 +371,9 @@ int btrfs_insert_inode_ref(struct btrfs_trans_handle 
> > *trans,
> >             ret = 0;
> >     } else if (ret < 0) {
> >             if (ret == -EOVERFLOW) {
> > -                   if (find_name_in_backref(path, name, name_len, &ref))
> > +                   if (btrfs_find_name_in_backref(path->nodes[0],
> > +                                                  path->slots[0],
> > +                                                  name, name_len, &ref))
> >                             ret = -EEXIST;
> >                     else
> >                             ret = -EMLINK;
> > diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> > index 411a022489e4..fd573816f461 100644
> > --- a/fs/btrfs/tree-log.c
> > +++ b/fs/btrfs/tree-log.c
> > @@ -966,7 +966,9 @@ static noinline int backref_in_log(struct btrfs_root 
> > *log,
> >     ptr = btrfs_item_ptr_offset(path->nodes[0], path->slots[0]);
> >  
> >     if (key->type == BTRFS_INODE_EXTREF_KEY) {
> > -           if (btrfs_find_name_in_ext_backref(path, ref_objectid,
> > +           if (btrfs_find_name_in_ext_backref(path->nodes[0],
> > +                                              path->slots[0],
> > +                                              ref_objectid,
> >                                                name, namelen, NULL))
> >                     match = 1;
> >  
> > @@ -1190,7 +1192,8 @@ static int extref_get_fields(struct extent_buffer 
> > *eb, unsigned long ref_ptr,
> >     read_extent_buffer(eb, *name, (unsigned long)&extref->name,
> >                        *namelen);
> >  
> > -   *index = btrfs_inode_extref_index(eb, extref);
> > +   if (index)
> > +           *index = btrfs_inode_extref_index(eb, extref);
> >     if (parent_objectid)
> >             *parent_objectid = btrfs_inode_extref_parent(eb, extref);
> >  
> > @@ -1211,12 +1214,102 @@ static int ref_get_fields(struct extent_buffer 
> > *eb, unsigned long ref_ptr,
> >  
> >     read_extent_buffer(eb, *name, (unsigned long)(ref + 1), *namelen);
> >  
> > -   *index = btrfs_inode_ref_index(eb, ref);
> > +   if (index)
> > +           *index = btrfs_inode_ref_index(eb, ref);
> >  
> >     return 0;
> >  }
> >  
> >  /*
> > + * Take an inode reference item from the log tree and iterate all names 
> > from the
> > + * inode reference item in the subvolume tree with the same key (if it 
> > exists).
> > + * For any name that is not in the inode reference item from the log tree, 
> > do a
> > + * proper unlink of that name (that is, remove its entry from the inode
> > + * reference item and both dir index keys).
> > + */
> > +static int unlink_old_inode_refs(struct btrfs_trans_handle *trans,
> > +                            struct btrfs_root *root,
> > +                            struct btrfs_path *path,
> > +                            struct btrfs_inode *inode,
> > +                            struct extent_buffer *log_eb,
> > +                            int log_slot,
> > +                            struct btrfs_key *key)
> > +{
> > +   int ret;
> > +   unsigned long ref_ptr;
> > +   unsigned long ref_end;
> > +   struct extent_buffer *eb;
> > +
> > +again:
> > +   btrfs_release_path(path);
> > +   ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
> > +   if (ret > 0) {
> > +           ret = 0;
> > +           goto out;
> > +   }
> > +   if (ret < 0)
> > +           goto out;
> > +
> > +   eb = path->nodes[0];
> > +   ref_ptr = btrfs_item_ptr_offset(eb, path->slots[0]);
> > +   ref_end = ref_ptr + btrfs_item_size_nr(eb, path->slots[0]);
> > +   while (ref_ptr < ref_end) {
> > +           char *name = NULL;
> > +           int namelen;
> > +           u64 parent_id;
> > +
> > +           if (key->type == BTRFS_INODE_EXTREF_KEY) {
> > +                   ret = extref_get_fields(eb, ref_ptr, &namelen, &name,
> > +                                           NULL, &parent_id);
> > +           } else {
> > +                   parent_id = key->offset;
> > +                   ret = ref_get_fields(eb, ref_ptr, &namelen, &name,
> > +                                        NULL);
> > +           }
> > +           if (ret)
> > +                   goto out;
> > +
> > +           if (key->type == BTRFS_INODE_EXTREF_KEY)
> > +                   ret = btrfs_find_name_in_ext_backref(log_eb, log_slot,
> > +                                                        parent_id, name,
> > +                                                        namelen, NULL);
> > +           else
> > +                   ret = btrfs_find_name_in_backref(log_eb, log_slot, name,
> > +                                                    namelen, NULL);
> > +
> > +           if (!ret) {
> > +                   struct inode *dir;
> > +
> > +                   btrfs_release_path(path);
> > +                   dir = read_one_inode(root, parent_id);
> > +                   if (!dir) {
> > +                           ret = -ENOENT;
> > +                           kfree(name);
> > +                           goto out;
> > +                   }
> > +                   ret = btrfs_unlink_inode(trans, root, BTRFS_I(dir),
> > +                                            inode, name, namelen);
> > +                   kfree(name);
> > +                   iput(dir);
> > +                   if (ret)
> > +                           goto out;
> > +                   goto again;
> > +           }
> > +
> > +           kfree(name);
> > +           ref_ptr += namelen;
> > +           if (key->type == BTRFS_INODE_EXTREF_KEY)
> > +                   ref_ptr += sizeof(struct btrfs_inode_extref);
> > +           else
> > +                   ref_ptr += sizeof(struct btrfs_inode_ref);
> > +   }
> > +   ret = 0;
> > + out:
> > +   btrfs_release_path(path);
> > +   return ret;
> > +}
> > +
> > +/*
> >   * replay one inode back reference item found in the log tree.
> >   * eb, slot and key refer to the buffer and key found in the log tree.
> >   * root is the destination we are replaying into, and path is for temp
> > @@ -1344,6 +1437,19 @@ static noinline int add_inode_ref(struct 
> > btrfs_trans_handle *trans,
> >             }
> >     }
> >  
> > +   /*
> > +    * Before we overwrite the inode reference item in the subvolume tree
> > +    * with the item from the log tree, we must unlink all names from the
> > +    * parent directory that are in the subvolume's tree inode reference
> > +    * item, otherwise we end up with an inconsistent subvolume tree where
> > +    * dir index entries exist for a name but there is no inode reference
> > +    * item with the same name.
> > +    */
> > +   ret = unlink_old_inode_refs(trans, root, path, BTRFS_I(inode), eb, slot,
> > +                               key);
> > +   if (ret)
> > +           goto out;
> > +
> >     /* finally write the back reference in the inode */
> >     ret = overwrite_item(trans, root, path, eb, slot, key);
> >  out:
> > -- 
> > 2.11.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to