On Sun, Dec 15, 2013 at 3:34 AM, Shilong Wang <[email protected]> wrote: > 2013/12/14 Filipe David Manana <[email protected]>: >> On Sat, Dec 14, 2013 at 3:13 PM, Filipe David Manana <[email protected]> >> wrote: >>> On Sat, Dec 14, 2013 at 3:08 PM, Shilong Wang <[email protected]> >>> wrote: >>>> 2013/12/14 Filipe David Manana <[email protected]>: >>>>> On Sat, Dec 14, 2013 at 2:56 PM, Shilong Wang <[email protected]> >>>>> wrote: >>>>>> Hello Filipe, >>>>>> >>>>>> 2013/12/14 Filipe David Borba Manana <[email protected]>: >>>>>>> Wang Shilong got into a case where during inode eviction we were >>>>>>> removing an extent map while it was pinned. This triggered a warning >>>>>>> in remove_extent_mapping() because the extent map had the pinned >>>>>>> flag set: >>>>>>> >>>>>>> [ 1209.102076] [<ffffffffa04721b9>] remove_extent_mapping+0x69/0x70 >>>>>>> [btrfs] >>>>>>> [ 1209.102084] [<ffffffffa0466b06>] btrfs_evict_inode+0x96/0x4d0 >>>>>>> [btrfs] >>>>>>> [ 1209.102089] [<ffffffff81073010>] ? wake_atomic_t_function+0x40/0x40 >>>>>>> [ 1209.102092] [<ffffffff8118ab2e>] evict+0x9e/0x190 >>>>>>> [ 1209.102094] [<ffffffff8118b313>] iput+0xf3/0x180 >>>>>>> [ 1209.102101] [<ffffffffa0461fd1>] btrfs_run_delayed_iputs+0xb1/0xd0 >>>>>>> [btrfs] >>>>>>> [ 1209.102107] [<ffffffffa045d358>] >>>>>>> __btrfs_end_transaction+0x268/0x350 [btrfs] >>>>>>> >>>>>>> Therefore wait for any pending ordered extents, if any, which will >>>>>>> trigger calls to unpin_extent_cache(), before removing the extent maps. >>>>>>> >>>>>>> Wang's solution of simply clearing the pinned bit wasn't enough, as >>>>>>> after >>>>>>> unpin_extent_cache() will be called and trigger another WARN_ON() >>>>>>> because >>>>>>> the lookup for the extent map returned NULL. >>>>>> >>>>>> Why not in evict_inode_truncate_pages() move remove_extent_mapping() >>>>>> after >>>>>> clear_extent_bit()? >>>>> >>>>> So, if the pinned bit is set, it means some task will clear it later, >>>>> via unpin_extent_cache(). And if you look at that function, it has >>>>> this: >>>>> >>>>> write_lock(&tree->lock); >>>>> em = lookup_extent_mapping(tree, start, len); >>>>> >>>>> WARN_ON(!em || em->start != start); >>>>> >>>>> And remove_extent_mapping() will remove the em from the rbtree, >>>>> regardless of its reference count value, therefore triggering that >>>>> warning above. >>>> >>>> Here i mean, in evict_inode_truncate_pages() >>>> We change it to: >>>> >>>> Step1: unpin_extent_cache() >>>> Step2: remove it from extent_mapping >>>> >>>> Dose this cause any problems? i am a little confused, correct me if i >>>> am wrong some places^_^. >>> >>> It can still lead to the same WARN_ON I think. So when calling >>> unpin_extent_cache(), it can merge the em with its left neighbor, >>> therefore changing its ->start value. So later, if other task (the one >>> which set the pinned flag) calls remove_extent_mapping(), it will get >>> an em with a different ->start (because of the merge), therefore >>> triggering that WARN_ON(). >> >> Or because it is not found the second time. >> >> On the other hand, you didn't get such WARN_ON triggered, right? > > During my test, i did not hit another WARN_ON in fact. > >> >> So maybe just clearing the pinned bit is ok. So btrfs_invalidatepage, >> if it finds an ordered extent, it sets the BTRFS_ORDERED_TRUNCATED >> flag on it, and then it might call btrfs_finish_ordered_io() against >> it, which not always unpins the extent when it has the truncated flag >> set. So this might well be what you ran into. > > Let's dig more. > > I take a deep look in btrfs_finish_ordered_io(), it won't unset PINNED flag > For an NOCOW extent. > > 2579 if (test_bit(BTRFS_ORDERED_NOCOW, > &ordered_extent->flags)) { > 2580 BUG_ON(!list_empty(&ordered_extent->list)); /* > Logic error */ > 2581 btrfs_ordered_update_i_size(inode, 0, ordered_extent); > 2582 if (nolock) > 2583 trans = btrfs_join_transaction_nolock(root); > 2584 else > 2585 trans = btrfs_join_transaction(root); > 2586 if (IS_ERR(trans)) { > 2587 ret = PTR_ERR(trans); > 2588 trans = NULL; > 2589 goto out; > 2590 } > 2591 trans->block_rsv = &root->fs_info->delalloc_block_rsv; > 2592 ret = btrfs_update_inode_fallback(trans, root, inode); > 2593 if (ret) /* -ENOMEM or corruption */ > 2594 btrfs_abort_transaction(trans, root, ret); > 2595 goto out; > ------------------------->here we goto out directly, > unpin_extent_cache() won't be called. > 2596 } > I don't know why we do something like this... > > > Previously, i unset PINNED flag directly is a lazy approach, that is > because i think it > dosen't hurt to do so, because we are going to evict that inode, and after > btrfs_invalidatepages() is called, nobody should can still access those pages. > > > Besides, i think we don't need call btrfs_wait_ordered_extents() in > evicting inode here.. > (Expecially after your patch that we remove extent map cache).... > > > What do you think ^_^
I think it might be ok. Thanks > > >> >> I'm ok with your approach too. >> >> thanks >> >> >>> >>> What do you think? >>> >>> thanks >>> >>>> >>>> >>>>> >>>>> Does it makes sense? >>>>> >>>>> thanks >>>>> >>>>>> >>>>>> Thanks, >>>>>> Wang >>>>>>> >>>>>>> Thanks Wang for finding out this. >>>>>>> >>>>>>> Signed-off-by: Filipe David Borba Manana <[email protected]> >>>>>>> --- >>>>>>> fs/btrfs/inode.c | 5 +++-- >>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>>>>>> index e889779..c2933fb 100644 >>>>>>> --- a/fs/btrfs/inode.c >>>>>>> +++ b/fs/btrfs/inode.c >>>>>>> @@ -4509,6 +4509,9 @@ static void evict_inode_truncate_pages(struct >>>>>>> inode *inode) >>>>>>> ASSERT(inode->i_state & I_FREEING); >>>>>>> truncate_inode_pages(&inode->i_data, 0); >>>>>>> >>>>>>> + /* do we really want it for ->i_nlink > 0 and zero >>>>>>> btrfs_root_refs? */ >>>>>>> + btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>>> + >>>>>>> write_lock(&map_tree->lock); >>>>>>> while (!RB_EMPTY_ROOT(&map_tree->map)) { >>>>>>> struct extent_map *em; >>>>>>> @@ -4566,8 +4569,6 @@ void btrfs_evict_inode(struct inode *inode) >>>>>>> btrfs_orphan_del(NULL, inode); >>>>>>> goto no_delete; >>>>>>> } >>>>>>> - /* do we really want it for ->i_nlink > 0 and zero >>>>>>> btrfs_root_refs? */ >>>>>>> - btrfs_wait_ordered_range(inode, 0, (u64)-1); >>>>>>> >>>>>>> if (root->fs_info->log_root_recovering) { >>>>>>> BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, >>>>>>> -- >>>>>>> 1.7.9.5 >>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" >>>>>>> in >>>>>>> the body of a message to [email protected] >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> -- >>>>> Filipe David Manana, >>>>> >>>>> "Reasonable men adapt themselves to the world. >>>>> Unreasonable men adapt the world to themselves. >>>>> That's why all progress depends on unreasonable men." >>> >>> >>> >>> -- >>> Filipe David Manana, >>> >>> "Reasonable men adapt themselves to the world. >>> Unreasonable men adapt the world to themselves. >>> That's why all progress depends on unreasonable men." >> >> >> >> -- >> Filipe David Manana, >> >> "Reasonable men adapt themselves to the world. >> Unreasonable men adapt the world to themselves. >> That's why all progress depends on unreasonable men." -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
