On 06/28/2018 04:02 PM, Qu Wenruo wrote:
Also CC Anand Jain, since he is also working on various device related
work, and an expert on this.

Especially I'm not pretty sure if such enhancement is already on his
agenda or there are already some unmerged patch for this.

 Right, some of the patches are already in the ML and probably it needs
 a revival. Unless there are new challenges, my target is to consolidate
 them and get them integrated by this year. I am trying harder.

 I think its a good idea to report the write-hole status to the
 user-land instead of failing the mount, so that a script can trigger
 the re-silver as required by the use case. I introduced
 fs_devices::volume_flags, which is under review as of now, and have
 plans to add the volume degraded state bit which can be accessed by
 the sysfs. So yes this is been taken care.


Thanks, Anand


Thanks,
Qu

On 2018年06月28日 15:04, Qu Wenruo wrote:
There is a reporter considering btrfs raid1 has a major design flaw
which can't handle nodatasum files.

Despite his incorrect expectation, btrfs indeed doesn't handle device
generation mismatch well.

This means if one devices missed and re-appeared, even its generation
no longer matches with the rest device pool, btrfs does nothing to it,
but treat it as normal good device.

At least let's detect such generation mismatch and avoid mounting the
fs.
Currently there is no automatic rebuild yet, which means if users find
device generation mismatch error message, they can only mount the fs
using "device" and "degraded" mount option (if possible), then replace
the offending device to manually "rebuild" the fs.

Signed-off-by: Qu Wenruo <w...@suse.com>
---
I totally understand that, generation based solution can't handle
split-brain case (where 2 RAID1 devices get mounted degraded separately)
at all, but at least let's handle what we can do.

The best way to solve the problem is to make btrfs treat such lower gen
devices as some kind of missing device, and queue an automatic scrub for
that device.
But that's a lot of extra work, at least let's start from detecting such
problem first.
---
  fs/btrfs/volumes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 50 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e034ad9e23b4..80a7c44993bc 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6467,6 +6467,49 @@ static void btrfs_report_missing_device(struct 
btrfs_fs_info *fs_info,
                              devid, uuid);
  }
+static int verify_devices_generation(struct btrfs_fs_info *fs_info,
+                                    struct btrfs_device *dev)
+{
+       struct btrfs_fs_devices *fs_devices = dev->fs_devices;
+       struct btrfs_device *cur;
+       bool warn_only = false;
+       int ret = 0;
+
+       if (!fs_devices || fs_devices->seeding || !dev->generation)
+               return 0;
+
+       /*
+        * If we're not replaying log, we're completely safe to allow
+        * generation mismatch as it won't write anything to disks, nor
+        * remount to rw.
+        */
+       if (btrfs_test_opt(fs_info, NOLOGREPLAY))
+               warn_only = true;
+
+       rcu_read_lock();
+       list_for_each_entry_rcu(cur, &fs_devices->devices, dev_list) {
+               if (cur->generation && cur->generation != dev->generation) {
+                       if (warn_only) {
+                               btrfs_warn_rl_in_rcu(fs_info,
+       "devid %llu has unexpected generation, has %llu expected %llu",
+                                                    dev->devid,
+                                                    dev->generation,
+                                                    cur->generation);
+                       } else {
+                               btrfs_err_rl_in_rcu(fs_info,
+       "devid %llu has unexpected generation, has %llu expected %llu",
+                                                    dev->devid,
+                                                    dev->generation,
+                                                    cur->generation);
+                               ret = -EINVAL;




+                               break;
+                       }
+               }
+       }
+       rcu_read_unlock();
+       return ret;
+}
+
  static int read_one_chunk(struct btrfs_fs_info *fs_info, struct btrfs_key 
*key,
                          struct extent_buffer *leaf,
                          struct btrfs_chunk *chunk)
@@ -6552,6 +6595,13 @@ static int read_one_chunk(struct btrfs_fs_info *fs_info, 
struct btrfs_key *key,
                                return PTR_ERR(map->stripes[i].dev);
                        }
                        btrfs_report_missing_device(fs_info, devid, uuid, 
false);
+               } else {
+                       ret = verify_devices_generation(fs_info,
+                                                       map->stripes[i].dev);
+                       if (ret < 0) {
+                               free_extent_map(em);
+                               return ret;
+                       }
                }
                set_bit(BTRFS_DEV_STATE_IN_FS_METADATA,
                                &(map->stripes[i].dev->dev_state));


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to