On Mon, Oct 22, 2018 at 11:05:08PM +0100, Filipe Manana wrote: > On Mon, Oct 22, 2018 at 8:18 PM Josef Bacik <jo...@toxicpanda.com> wrote: > > > > On Mon, Oct 22, 2018 at 08:10:37PM +0100, fdman...@kernel.org wrote: > > > From: Filipe Manana <fdman...@suse.com> > > > > > > When we are writing out a free space cache, during the transaction commit > > > phase, we can end up in a deadlock which results in a stack trace like the > > > following: > > > > > > schedule+0x28/0x80 > > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > > ? finish_wait+0x80/0x80 > > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > > ? inode_insert5+0x119/0x190 > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_iget+0x113/0x690 [btrfs] > > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > > load_free_space_cache+0x7c/0x170 [btrfs] > > > ? cache_block_group+0x72/0x3b0 [btrfs] > > > cache_block_group+0x1b3/0x3b0 [btrfs] > > > ? finish_wait+0x80/0x80 > > > find_free_extent+0x799/0x1010 [btrfs] > > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > > btrfs_cow_block+0xdc/0x180 [btrfs] > > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > > ? kmem_cache_alloc+0x166/0x1d0 > > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > > cache_save_setup+0xe4/0x3a0 [btrfs] > > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > > > At cache_save_setup() we need to update the inode item of a block group's > > > cache which is located in the tree root (fs_info->tree_root), which means > > > that it may result in COWing a leaf from that tree. If that happens we > > > need to find a free metadata extent and while looking for one, if we find > > > a block group which was not cached yet we attempt to load its cache by > > > calling cache_block_group(). However this function will try to load the > > > inode of the free space cache, which requires finding the matching inode > > > item in the tree root - if that inode item is located in the same leaf as > > > the inode item of the space cache we are updating at cache_save_setup(), > > > we end up in a deadlock, since we try to obtain a read lock on the same > > > extent buffer that we previously write locked. > > > > > > So fix this by skipping the loading of free space caches of any block > > > groups that are not yet cached (rare cases) if we are COWing an extent > > > buffer from the root tree and space caching is enabled (-o space_cache > > > mount option). This is a rare case and its downside is failure to > > > find a free extent (return -ENOSPC) when all the already cached block > > > groups have no free extents. > > > > > > Reported-by: Andrew Nelson <andrew.s.nel...@gmail.com> > > > Link: > > > https://lore.kernel.org/linux-btrfs/captelenq9x5kowuq+fa7h1r3nsjg8vyith8+ifjurc_duhh...@mail.gmail.com/ > > > Fixes: 9d66e233c704 ("Btrfs: load free space cache if it exists") > > > Tested-by: Andrew Nelson <andrew.s.nel...@gmail.com> > > > Signed-off-by: Filipe Manana <fdman...@suse.com> > > > > Great, thanks, > > > > Reviewed-by: Josef Bacik <jo...@toxicpanda.com> > > So this makes many fstests occasionally fail with aborted transaction > due to ENOSPC. > It's late and I haven't verified yet, but I suppose this is because we > always skip loading the cache regardless of currently being COWing an > existing leaf or allocating a new one (growing the tree). > Needs to be fixed. >
How about we just use path->search_commit_root? If we're loading the cache we just want the last committed version, it's not like we read it after we've written it. Then we can go back to business as usual. Thanks, Josef