On 10/8/19 1:03 AM, David Sterba wrote:
On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote:
On 2019/10/7 下午9:30, Nikolay Borisov wrote:
On 7.10.19 г. 12:45 ч., Anand Jain wrote:
Following test case explains it all, even though the degraded mount is
successful the btrfs-progs fails to report the missing device.
mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
btrfs fi show -m /btrfs
Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
Total devices 2 FS bytes used 128.00KiB
devid 1 size 1.09TiB used 2.01GiB path /dev/sdc
devid 2 size 1.09TiB used 2.01GiB path /dev/sdd
This is because btrfs-progs does it fundamentally wrong way that
it deduces the missing device status in the user land instead of
refuting from the kernel.
At the same time in the kernel when we know that there is device
with non-btrfs magic, then remove that device from the list so
that btrfs-progs or someother userland utility won't be confused.
Signed-off-by: Anand Jain <anand.j...@oracle.com>
---
fs/btrfs/disk-io.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 326d5281ad93..e05856432456 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev,
int copy_num,
if (btrfs_super_bytenr(super) != bytenr ||
btrfs_super_magic(super) != BTRFS_MAGIC) {
brelse(bh);
- return -EINVAL;
+ return -EUCLEAN;
This is really non-obvious and you are propagating the special-meaning
of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
patch does is make the following call chain return EUCLAN:
btrfs_open_one_device <-- finally removing the device in this function
btrfs_get_bdev_and_sb <-- propagating it to here
btrfs_read_dev_super
btrfs_read_dev_one_super <-- you return the EUCLEAN
And your commit log doesn't mention anything about that. EUCLEAN
warrants a comment in this case since it changes behavior in
higher-level layers.
Ok. Which means the commit log needs to be fixed.
And, for most case, EUCLEAN should have a proper kernel message for
what's going wrong, the value we hit and the value we expect.
If its a kernel message for EUCLEAN in the context of mounted state,
I would say yes, as it indicates a corruption.
But here we are still in the unmounted context and moving towards
mounted context. Having an alien device in the fs_devices is not
something logical bug nor a corruption, it just about a disk whose
disk superblock got overwritten by the user operation. And its not
a good idea to log the kernel messages arising from the user
operation especially in the unmounted context. Otherwise we shall
endup cluttering the kernel messages. Moreover if there is an alien
device, the user must use -o degraded and we do have the missing
kernel messages, and this will suffice to explain the state about the
fsid being mounted. And the alien fsid, its a stale we don't worry
about it. So no kernel messages.
And for EUCLEAN, it's more like graceful error out for impossible thing.
This is definitely not the case, and I believe the original EINVAL makes
more sense than EUCLEAN.
I agree, EUCLEAN is wrong here.
I am ok with any other suitable. It does not matter. And the other
choice I have is ESTALE.
But EINVAL is no go because we still want to keep the messed up device
unless there is a confirmation that its alien.
In the same function we use EINVAL if the device/partition size
changed to smaller size after we have scanned the device. Which
means we still keep the device in the list. Its the user choice,
no need to meddle with it.
bytenr = btrfs_sb_offset(copy_num);
if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
return -EINVAL;
But, EUCLEAN /* structure needs cleaning */ is errno which exactly
says whats needed here. Because an alien device is a kind of
corruption but it can easily happen due to unplanned device operations
in the userland. But as it happened before we assembled the volume so
its safe to clean and is not a road block when there is redundancy
with degraded option.
Thanks, Anand