On Mon, 27 Oct 2014 13:44:22 +0000, Filipe David Manana wrote:
> On Mon, Oct 27, 2014 at 12:11 PM, Filipe David Manana
> <fdman...@gmail.com> wrote:
>> On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie <mi...@cn.fujitsu.com> wrote:
>>> On Mon, 27 Oct 2014 09:19:52 +0000, Filipe Manana wrote:
>>>> We have a race that can lead us to miss skinny extent items in the function
>>>> btrfs_lookup_extent_info() when the skinny metadata feature is enabled.
>>>> So basically the sequence of steps is:
>>>>
>>>> 1) We search in the extent tree for the skinny extent, which returns > 0
>>>>    (not found);
>>>>
>>>> 2) We check the previous item in the returned leaf for a non-skinny extent,
>>>>    and we don't find it;
>>>>
>>>> 3) Because we didn't find the non-skinny extent in step 2), we release our
>>>>    path to search the extent tree again, but this time for a non-skinny
>>>>    extent key;
>>>>
>>>> 4) Right after we released our path in step 3), a skinny extent was 
>>>> inserted
>>>>    in the extent tree (delayed refs were run) - our second extent tree 
>>>> search
>>>>    will miss it, because it's not looking for a skinny extent;
>>>>
>>>> 5) After the second search returned (with ret > 0), we look for any delayed
>>>>    ref for our extent's bytenr (and we do it while holding a read lock on 
>>>> the
>>>>    leaf), but we won't find any, as such delayed ref had just run and 
>>>> completed
>>>>    after we released out path in step 3) before doing the second search.
>>>>
>>>> Fix this by removing completely the path release and re-search logic. This 
>>>> is
>>>> safe, because if we seach for a metadata item and we don't find it, we 
>>>> have the
>>>> guarantee that the returned leaf is the one where the item would be 
>>>> inserted,
>>>> and so path->slots[0] > 0 and path->slots[0] - 1 must be the slot where the
>>>> non-skinny extent item is if it exists. The only case where path->slots[0] 
>>>> is
>>>
>>> I think this analysis is wrong if there are some independent shared ref 
>>> metadata for
>>> a tree block, just like:
>>> +------------------------+-------------+-------------+
>>> | tree block extent item | shared ref1 | shared ref2 |
>>> +------------------------+-------------+-------------+
> 
> Trying to guess what's in your mind.
> 
> Is the concern that if after a non-skinny extent item we have
> non-inlined references, the assumption that path->slots[0] - 1 points
> to the extent item would be wrong when searching for a skinny extent?
> 
> That wouldn't be the case because BTRFS_EXTENT_ITEM_KEY == 168 and
> BTRFS_METADATA_ITEM_KEY == 169, with BTRFS_SHARED_BLOCK_REF_KEY ==
> 182. So in the presence of such non-inlined shared tree block
> reference items, searching for a skinny extent item leaves us at a
> slot that points to the first non-inlined ref (regardless of its type,
> since they're all > 169), and therefore path->slots[0] - 1 is the
> non-skinny extent item.

You are right. I forget to check the value of key type. Sorry.

This patch seems good for me.

Reviewed-by: Miao Xie <mi...@cn.fujitsu.com>

> 
> thanks.
> 
>>
>> Why does that matters? Can you elaborate why it's not correct?
>>
>> We're looking for the extent item only in btrfs_lookup_extent_info(),
>> and running a delayed ref, independently of being inlined/shared, it
>> implies inserting a new extent item or updating an existing extent
>> item (updating ref count).
>>
>> thanks
>>
>>>
>>> Thanks
>>> Miao
>>>
>>>> zero is when there are no smaller keys in the tree (i.e. no left siblings 
>>>> for
>>>> our leaf), in which case the re-search logic isn't needed as well.
>>>>
>>>> This race has been present since the introduction of skinny metadata 
>>>> (change
>>>> 3173a18f70554fe7880bb2d85c7da566e364eb3c).
>>>>
>>>> Signed-off-by: Filipe Manana <fdman...@suse.com>
>>>> ---
>>>>  fs/btrfs/extent-tree.c | 8 --------
>>>>  1 file changed, 8 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>>>> index 9141b2b..2cedd06 100644
>>>> --- a/fs/btrfs/extent-tree.c
>>>> +++ b/fs/btrfs/extent-tree.c
>>>> @@ -780,7 +780,6 @@ search_again:
>>>>       else
>>>>               key.type = BTRFS_EXTENT_ITEM_KEY;
>>>>
>>>> -again:
>>>>       ret = btrfs_search_slot(trans, root->fs_info->extent_root,
>>>>                               &key, path, 0, 0);
>>>>       if (ret < 0)
>>>> @@ -796,13 +795,6 @@ again:
>>>>                           key.offset == root->nodesize)
>>>>                               ret = 0;
>>>>               }
>>>> -             if (ret) {
>>>> -                     key.objectid = bytenr;
>>>> -                     key.type = BTRFS_EXTENT_ITEM_KEY;
>>>> -                     key.offset = root->nodesize;
>>>> -                     btrfs_release_path(path);
>>>> -                     goto again;
>>>> -             }
>>>>       }
>>>>
>>>>       if (ret == 0) {
>>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to