On Mon, Nov 12, 2018 at 10:23:58AM +0000, [email protected] wrote: > From: Filipe Manana <[email protected]> > > After the simplification of the fast fsync patch done recently by commit > b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and > commit e7175a692765 ("btrfs: remove the wait ordered logic in the > log_one_extent path"), we got a very short time window where we can get > extents logged without writeback completing first or extents logged > without logging the respective data checksums. Both issues can only happen > when doing a non-full (fast) fsync. > > As soon as we enter btrfs_sync_file() we trigger writeback, then lock the > inode and then wait for the writeback to complete before starting to log > the inode. However before we acquire the inode's lock and after we started > writeback, it's possible that more writes happened and dirtied more pages. > If that happened and those pages get writeback triggered while we are > logging the inode (for example, the VM subsystem triggering it due to > memory pressure, or another concurrent fsync), we end up seeing the > respective extent maps in the inode's list of modified extents and will > log matching file extent items without waiting for the respective > ordered extents to complete, meaning that either of the following will > happen: > > 1) We log an extent after its writeback finishes but before its checksums > are added to the csum tree, leading to -EIO errors when attempting to > read the extent after a log replay. > > 2) We log an extent before its writeback finishes. > Therefore after the log replay we will have a file extent item pointing > to an unwritten extent (and without the respective data checksums as > well). > > This could not happen before the fast fsync patch simplification, because > for any extent we found in the list of modified extents, we would wait for > its respective ordered extent to finish writeback or collect its checksums > for logging if it did not complete yet. > > Fix this by triggering writeback again after acquiring the inode's lock > and before waiting for ordered extents to complete. > > Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the > log_one_extent path") > Fixes: b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") > CC: [email protected] # 4.19+ > Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Josef Bacik <[email protected]> Thanks, Josef
