On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote: > Currently if some fatal errors occur, like all IO get -EIO, resources > would be cleaned up when > a) transaction is being committed or > b) BTRFS_FS_STATE_ERROR is set > > However, in some rare cases, resources may be left alone after transaction > gets aborted and umount may run into some ASSERT(), e.g. > ASSERT(list_empty(&block_group->dirty_list)); > > For case a), in btrfs_commit_transaciton(), there're several places at the > beginning where we just call btrfs_end_transaction() without cleaning up > resources. For case b), it is possible that the trans handle doesn't have > any dirty stuff, then only trans hanlde is marked as aborted while > BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. > > This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that > all resources won't stay in memory after umount. > > Signed-off-by: Liu Bo <bo....@linux.alibaba.com>
Is it possible that the following stactrace could be caused by the missing check? It roughly matches what you describe (ie. close_ctree and unreleased resources). This is from generic/475, that does some error injection: [16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 btrfs_free_block_groups+0x2c8/0x420 [btrfs] [16991.621105] close_ctree+0x114/0x2d0 [btrfs] [16991.625482] generic_shutdown_super+0x6c/0x120 [16991.630025] kill_anon_super+0xe/0x20 [16991.633820] btrfs_kill_super+0x13/0x100 [btrfs] [16991.638550] deactivate_locked_super+0x3f/0x70 [16991.643332] cleanup_mnt+0x3b/0x70 [16991.646889] task_work_run+0x89/0xa0 [16991.650565] exit_to_usermode_loop+0x79/0xa3 [16991.654985] do_syscall_64+0xe9/0x110 [16991.658841] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html