On 2019/1/8 下午7:45, Qu Wenruo wrote:
> 
> 
> On 2019/1/8 下午7:16, Filipe Manana wrote:
>> On Tue, Jan 8, 2019 at 6:11 AM Qu Wenruo <[email protected]> wrote:
>>>
>>> [BUG]
>>> Linux v5.0-rc1 will fail fstests/btrfs/163 with the following kernel
>>> message:
>>>
>>>   BTRFS error (device dm-6): dev extent devid 1 physical offset 13631488 
>>> len 8388608 is beyond device boundary 0
>>>   BTRFS error (device dm-6): failed to verify dev extents against chunks: 
>>> -117
>>>   BTRFS error (device dm-6): open_ctree failed
>>>
>>> [CAUSE]
>>> Commit cf90d884b347 ("btrfs: Introduce mount time chunk <-> dev extent
>>> mapping check") introduced strict check on dev extents.
>>>
>>> We use btrfs_find_device() with dev uuid and fs uuid set to NULL, and
>>> only dependent on @devid to find the real device.
>>>
>>> For seed devices, we call clone_fs_devices() in open_seed_devices() to
>>> allow us search seed devices directly.
>>>
>>> However clone_fs_devices() just populates devices with devid and dev
>>> uuid, without populating other essential members, like disk_total_bytes.
>>>
>>> This makes any device returned by btrfs_find_device(fs_info, devid,
>>> NULL, NULL) is just a dummy, with 0 disk_total_bytes, and any dev
>>> extents on the seed device will not pass the device boundary check.
>>>
>>> [FIX]
>>> This patch will try to verify the device returned by btrfs_find_device()
>>> and if it's a dummy then re-search in seed devices.
>>>
>>> Reported-by: Filipe Manana <[email protected]>
>>> Fixes: cf90d884b347 ("btrfs: Introduce mount time chunk <-> dev extent 
>>> mapping check")
>>> Signed-off-by: Qu Wenruo <[email protected]>
>>> ---
>>>  fs/btrfs/volumes.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 2576b1a379c9..3e4f8f88353e 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -7825,6 +7825,18 @@ static int verify_one_dev_extent(struct 
>>> btrfs_fs_info *fs_info,
>>>                 ret = -EUCLEAN;
>>>                 goto out;
>>>         }
>>> +
>>> +       /* It's possible this device is a dummy for seed device */
>>> +       if (dev->disk_total_bytes == 0) {
>>> +               dev = find_device(fs_info->fs_devices->seed, devid, NULL);
>>> +               if (!dev) {
>>> +                       btrfs_err(fs_info, "failed to find seed devid %llu",
>>> +                                 devid);
>>> +                       ret = -EUCLEAN;
>>> +                       goto out;
>>> +               }
>>> +       }
>>
>> Why just not pass the FSID (can be taken from fs_info->super_copy) to
>> the previous call to btrfs_find_device()?
>> It's a lot simpler.
> 
> Then btrfs_find_device() will just return NULL.
> We still need to verify the dev extent of the seed device not to exceed
> seed device boundary.

OK, it's not the case. Even we can pass correct dev uuid into
btrfs_find_device(), it will still give us the dummy device, with 0
disk_total_bytes.

The problem is, we have a screwed up fs_info->fs_devices for seed device
from the very beginning.

For a seeding fs, devid 1 is seed device and devid 2 is real rw device.
Then our fs_devices looks like:

fs_info->fs_devices
|- devices
|  |- devid 1
|  |  dummy, devid 1 dev uuid, any else is unpopulated
|  |  created by open_seed_devices()-> clone_fs_devices()
|  |- devid 2
|     real one, devid 2 dev uuid, everything is good
|- seeding
   |- devices
      |- devid 1
         real seed device.

So, at btrfs_find_device() call time, we must pass the *seeding* fsid,
to locate the real seed device.
Or we can only get a dummy.

This patch is just a quick and dirty fix.

The correct way to fix is to not screw up fs_devices any more with such
meaningless dummy in fs_devices.
Especially when btrfs_find_device() has already handled such case.

I don't understand why we do such meaningless clone_fs_devices() call,
but I don't think the proper fix may catch the v5.0 window.

Thanks,
Qu


> 
> What we need is device uuid, however we can't easily get that dev uuid
> from dev extent item.
> 
> Thanks,
> Qu
> 
>>
>>> +
>>>         if (physical_offset + physical_len > dev->disk_total_bytes) {
>>>                 btrfs_err(fs_info,
>>>  "dev extent devid %llu physical offset %llu len %llu is beyond device 
>>> boundary %llu",
>>> --
>>> 2.20.1
>>>
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to