> On 7 Aug 2019, at 7:15 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
> 
> 
> 
> On 2019/8/7 下午5:43, Anand Jain wrote:
>> 
>> 
>>> On 7 Aug 2019, at 4:55 PM, Qu Wenruo <quwenruo.bt...@gmx.com> wrote:
>>> 
>>> 
>>> 
>>> On 2019/8/7 下午4:44, Filipe Manana wrote:
>>>> On Wed, Aug 7, 2019 at 9:35 AM Anand Jain <anand.j...@oracle.com> wrote:
>>>>> 
>>>>> Commit 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole
>>>>> filesystem) makes sure we always trim starting from the first block group.
>>>>> However it also removed the range.start validity check which is set in the
>>>>> context of the user, where its range is from 0 to maximum of filesystem
>>>>> totalbytes and so we have to check its validity in the kernel.
>>>>> 
>>>>> Also as in the fstrim(8) [1] the kernel layers may modify the trim range.
>>>>> 
>>>>> [1]
>>>>> Further, the kernel block layer reserves the right to adjust the discard
>>>>> ranges to fit raid stripe geometry, non-trim capable devices in a LVM
>>>>> setup, etc. These reductions would not be reflected in fstrim_range.len
>>>>> (the --length option).
>>>>> 
>>>>> This patch undos the deleted range::start validity check.
>>>>> 
>>>>> Fixes: 6ba9fc8e628b (btrfs: Ensure btrfs_trim_fs can trim the whole 
>>>>> filesystem)
>>>>> Signed-off-by: Anand Jain <anand.j...@oracle.com>
>>>>> ---
>>>>> With this patch fstests generic/260 is successful now.
>>>>> 
>>>>> fs/btrfs/ioctl.c | 2 ++
>>>>> 1 file changed, 2 insertions(+)
>>>>> 
>>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>>> index b431f7877e88..9345fcdf80c7 100644
>>>>> --- a/fs/btrfs/ioctl.c
>>>>> +++ b/fs/btrfs/ioctl.c
>>>>> @@ -521,6 +521,8 @@ static noinline int btrfs_ioctl_fitrim(struct file 
>>>>> *file, void __user *arg)
>>>>>               return -EOPNOTSUPP;
>>>>>       if (copy_from_user(&range, arg, sizeof(range)))
>>>>>               return -EFAULT;
>>>>> +       if (range.start > btrfs_super_total_bytes(fs_info->super_copy))
>>>>> +               return -EINVAL;
>>>> 
>>>> This makes it impossible to trim block groups that start at an offset
>>>> greater then the super_total_bytes values.
>>> 
>> 
>> what did I miss? As there is no limit on the range::length
>> so we can still trim beyond super_total_bytes. So I don’t
>> understand why not? More below.
>> 
>> 
>>> Exactly.
>>> 
>>>> In fact, in extreme cases
>>>> it's possible all block groups start at offsets larger then
>>>> super_total_bytes.
>>>> Have you considered that, or am I missing something?
>>> 
>>> And, I have already mentioned exactly the same reason in that commit
>>> message.
>>> 
>>> To address it once again, the bytenr is btrfs logical address space, has
>>> nothing to do with any device.
>>> Thus it can be [0, U64MAX].
>>> 
>> 
>> Fundamentally, logical address space has no relevance in the user context,
>> so also I don’t understand your view on how anyone shall use the range::start
>> even if there is no check?
> 
> range::start == bg_bytenr, range::len = bg_len to trim only a bg.
> 

 Thanks for the efforts in explaining.

 My point is- it should not be one single bg but it should rather be all bg(s) 
in the specified range [start, length] so the %range.start=0 and 
%range.length=<U64MAX/total_bytes> should trim all the bg(s). May be your next 
question is- as we relocate the chunks how would the user ever know correct 
range.start to use? for which I don’t have an answer and the same question 
again applies to your proposal range.start=[0 to U64MAX] as well.

 So I am asking you again, even if you allow range.start=[0 to U64MAX] how will 
the user use it? Can you please explain?


> And that bg_bytenr is at 128T, since the fs has gone through several
> balance.
> But there is only one device, and its size is only 1T.
> 
> Please tell me how to trim that block group only?
> 

 Block groups are something internal the users don’t have to worry about it. 
The range is [0 to totalbytes] for start and [0 to U64MAX] for len is fair.

>> 
>> As in the man page it's ok to adjust the range internally, and as length
>> can be up to U64MAX we can still trim beyond super_total_bytes?
> 
> As I said already, super_total_bytes makes no sense in logical address
> space.

 But super_total_bytes makes sense in the user land though, on the other hand 
logical address space which you are trying to expose to the user land does not 
make sense to me.

> As super_total_bytes is just the sum of all devices, it's a device layer
> thing, nothing to do with logical address space.
> 
> You're mixing logical bytenr with something not even a device physical
> offset, how can it be correct?
> 
> Let me make it more clear, btrfs has its own logical address space
> unrelated to whatever the devices mapping are.
> It's always [0, U64_MAX], no matter how many devices there are.
> 
> If btrfs is implemented using dm, it should be more clear.
> 
> (single device btrfs)
>          |
> (dm linear, 0 ~ U64_MAX, virtual devices)<- that's logical address space
>  |   |   |    |
>  |   |   |    \- (dm raid1, 1T ~ 1T + 128M, devid1 XXX, devid2 XXX)
>  |   |   \------ (dm raid0, 2T ~ 2T + 1G, devid1 XXX, devid2 XXX)
>  |   \---------- (dm raid1, 128G ~ 128G + 128M, devi1 XXX, devid xxx)
>  \-------------- (dm raid0, 1M ~ 1M + 1G, devid1 xxx, devid2 xxx).
> 
> If we're trim such fs layout, you tell me which offset you should use.
> 

 There is no perfect solution, the nearest solution I can think - map 
range.start and range.len to the physical disk range and search and discard 
free spaces in that range. This idea may be ok for raids/linear profiles, but 
again as btrfs can relocate the chunks its not perfect.

Thanks, Anand


> Thanks,
> Qu
> 
>> 
>> Thanks, Anand
>> 
>> 
>>> Thanks,
>>> Qu
>>> 
>>>> 
>>>> The change log is also vague to me, doesn't explain why you are
>>>> re-adding that check.
>>>> 
>>>> Thanks.
>>>> 
>>>>> 
>>>>>       /*
>>>>>        * NOTE: Don't truncate the range using super->total_bytes.  Bytenr 
>>>>> of
>>>>> --
>>>>> 2.21.0 (Apple Git-120)

Reply via email to