On Mon, 27 Oct 2014 13:44:22 +0000, Filipe David Manana wrote: > On Mon, Oct 27, 2014 at 12:11 PM, Filipe David Manana > <fdman...@gmail.com> wrote: >> On Mon, Oct 27, 2014 at 11:08 AM, Miao Xie <mi...@cn.fujitsu.com> wrote: >>> On Mon, 27 Oct 2014 09:19:52 +0000, Filipe Manana wrote: >>>> We have a race that can lead us to miss skinny extent items in the function >>>> btrfs_lookup_extent_info() when the skinny metadata feature is enabled. >>>> So basically the sequence of steps is: >>>> >>>> 1) We search in the extent tree for the skinny extent, which returns > 0 >>>> (not found); >>>> >>>> 2) We check the previous item in the returned leaf for a non-skinny extent, >>>> and we don't find it; >>>> >>>> 3) Because we didn't find the non-skinny extent in step 2), we release our >>>> path to search the extent tree again, but this time for a non-skinny >>>> extent key; >>>> >>>> 4) Right after we released our path in step 3), a skinny extent was >>>> inserted >>>> in the extent tree (delayed refs were run) - our second extent tree >>>> search >>>> will miss it, because it's not looking for a skinny extent; >>>> >>>> 5) After the second search returned (with ret > 0), we look for any delayed >>>> ref for our extent's bytenr (and we do it while holding a read lock on >>>> the >>>> leaf), but we won't find any, as such delayed ref had just run and >>>> completed >>>> after we released out path in step 3) before doing the second search. >>>> >>>> Fix this by removing completely the path release and re-search logic. This >>>> is >>>> safe, because if we seach for a metadata item and we don't find it, we >>>> have the >>>> guarantee that the returned leaf is the one where the item would be >>>> inserted, >>>> and so path->slots[0] > 0 and path->slots[0] - 1 must be the slot where the >>>> non-skinny extent item is if it exists. The only case where path->slots[0] >>>> is >>> >>> I think this analysis is wrong if there are some independent shared ref >>> metadata for >>> a tree block, just like: >>> +------------------------+-------------+-------------+ >>> | tree block extent item | shared ref1 | shared ref2 | >>> +------------------------+-------------+-------------+ > > Trying to guess what's in your mind. > > Is the concern that if after a non-skinny extent item we have > non-inlined references, the assumption that path->slots[0] - 1 points > to the extent item would be wrong when searching for a skinny extent? > > That wouldn't be the case because BTRFS_EXTENT_ITEM_KEY == 168 and > BTRFS_METADATA_ITEM_KEY == 169, with BTRFS_SHARED_BLOCK_REF_KEY == > 182. So in the presence of such non-inlined shared tree block > reference items, searching for a skinny extent item leaves us at a > slot that points to the first non-inlined ref (regardless of its type, > since they're all > 169), and therefore path->slots[0] - 1 is the > non-skinny extent item.
You are right. I forget to check the value of key type. Sorry. This patch seems good for me. Reviewed-by: Miao Xie <mi...@cn.fujitsu.com> > > thanks. > >> >> Why does that matters? Can you elaborate why it's not correct? >> >> We're looking for the extent item only in btrfs_lookup_extent_info(), >> and running a delayed ref, independently of being inlined/shared, it >> implies inserting a new extent item or updating an existing extent >> item (updating ref count). >> >> thanks >> >>> >>> Thanks >>> Miao >>> >>>> zero is when there are no smaller keys in the tree (i.e. no left siblings >>>> for >>>> our leaf), in which case the re-search logic isn't needed as well. >>>> >>>> This race has been present since the introduction of skinny metadata >>>> (change >>>> 3173a18f70554fe7880bb2d85c7da566e364eb3c). >>>> >>>> Signed-off-by: Filipe Manana <fdman...@suse.com> >>>> --- >>>> fs/btrfs/extent-tree.c | 8 -------- >>>> 1 file changed, 8 deletions(-) >>>> >>>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >>>> index 9141b2b..2cedd06 100644 >>>> --- a/fs/btrfs/extent-tree.c >>>> +++ b/fs/btrfs/extent-tree.c >>>> @@ -780,7 +780,6 @@ search_again: >>>> else >>>> key.type = BTRFS_EXTENT_ITEM_KEY; >>>> >>>> -again: >>>> ret = btrfs_search_slot(trans, root->fs_info->extent_root, >>>> &key, path, 0, 0); >>>> if (ret < 0) >>>> @@ -796,13 +795,6 @@ again: >>>> key.offset == root->nodesize) >>>> ret = 0; >>>> } >>>> - if (ret) { >>>> - key.objectid = bytenr; >>>> - key.type = BTRFS_EXTENT_ITEM_KEY; >>>> - key.offset = root->nodesize; >>>> - btrfs_release_path(path); >>>> - goto again; >>>> - } >>>> } >>>> >>>> if (ret == 0) { >>>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >>> the body of a message to majord...@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men." > > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html