On Thu, Jun 25, 2015 at 03:23:59PM +0100, Filipe Manana wrote: > On Thu, Jun 25, 2015 at 3:20 PM, Liu Bo <[email protected]> wrote: > > On Thu, Jun 25, 2015 at 04:17:46AM +0100, [email protected] wrote: > >> From: Filipe Manana <[email protected]> > >> > >> When we have the no_holes feature enabled, if a we truncate a file to a > >> smaller size, truncate it again but to a size greater than or equals to > >> its original size and fsync it, the log tree will not have any information > >> about the hole covering the range [truncate_1_offset, new_file_size[. > >> Which means if the fsync log is replayed, the file will remain with the > >> state it had before both truncate operations. > > > > Does the fs/subvol tree get updated to the right information at this > > time? > > No, and that's the problem. Because no file extent items are stored in > the log tree. > The inode item is updated with the new i_size however (as expected).
Yeap, that's right and the patch looks right. I do appreciate your great work on fixing these corner cases, but as of my understanding, they really can be taken by a force commit transaction, do they deserve these complex stuff? After all, like punch_hole, remove xattr, they're rare cases. Thanks, -liubo > > thanks > > > > > Thanks, > > > > -liubo > >> > >> Without the no_holes feature this does not happen, since when the inode > >> is logged (full sync flag is set) it will find in the fs/subvol tree a > >> leaf with a generation matching the current transaction id that has an > >> explicit extent item representing the hole. > >> > >> Fix this by adding an explicit extent item representing a hole between > >> the last extent and the inode's i_size if we are doing a full sync. > >> > >> The issue is easy to reproduce with the following test case for fstests: > >> > >> . ./common/rc > >> . ./common/filter > >> . ./common/dmflakey > >> > >> _need_to_be_root > >> _supported_fs generic > >> _supported_os Linux > >> _require_scratch > >> _require_dm_flakey > >> > >> # This test was motivated by an issue found in btrfs when the btrfs > >> # no-holes feature is enabled (introduced in kernel 3.14). So enable > >> # the feature if the fs being tested is btrfs. > >> if [ $FSTYP == "btrfs" ]; then > >> _require_btrfs_fs_feature "no_holes" > >> _require_btrfs_mkfs_feature "no-holes" > >> MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" > >> fi > >> > >> rm -f $seqres.full > >> > >> _scratch_mkfs >>$seqres.full 2>&1 > >> _init_flakey > >> _mount_flakey > >> > >> # Create our test files and make sure everything is durably persisted. > >> $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ > >> -c "pwrite -S 0xbb 64K 61K" \ > >> $SCRATCH_MNT/foo | _filter_xfs_io > >> $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ > >> -c "pwrite -S 0xff 64K 61K" \ > >> $SCRATCH_MNT/bar | _filter_xfs_io > >> sync > >> > >> # Now truncate our file foo to a smaller size (64Kb) and then truncate > >> # it to the size it had before the shrinking truncate (125Kb). Then > >> # fsync our file. If a power failure happens after the fsync, we expect > >> # our file to have a size of 125Kb, with the first 64Kb of data having > >> # the value 0xaa and the second 61Kb of data having the value 0x00. > >> $XFS_IO_PROG -c "truncate 64K" \ > >> -c "truncate 125K" \ > >> -c "fsync" \ > >> $SCRATCH_MNT/foo > >> > >> # Do something similar to our file bar, but the first truncation sets > >> # the file size to 0 and the second truncation expands the size to the > >> # double of what it was initially. > >> $XFS_IO_PROG -c "truncate 0" \ > >> -c "truncate 253K" \ > >> -c "fsync" \ > >> $SCRATCH_MNT/bar > >> > >> _load_flakey_table $FLAKEY_DROP_WRITES > >> _unmount_flakey > >> > >> # Allow writes again, mount to trigger log replay and validate file > >> # contents. > >> _load_flakey_table $FLAKEY_ALLOW_WRITES > >> _mount_flakey > >> > >> # We expect foo to have a size of 125Kb, the first 64Kb of data all > >> # having the value 0xaa and the remaining 61Kb to be a hole (all bytes > >> # with value 0x00). > >> echo "File foo content after log replay:" > >> od -t x1 $SCRATCH_MNT/foo > >> > >> # We expect bar to have a size of 253Kb and no extents (any byte read > >> # from bar has the value 0x00). > >> echo "File bar content after log replay:" > >> od -t x1 $SCRATCH_MNT/bar > >> > >> status=0 > >> exit > >> > >> The expected file contents in the golden output are: > >> > >> File foo content after log replay: > >> 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > >> * > >> 0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> * > >> 0372000 > >> File bar content after log replay: > >> 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> * > >> 0772000 > >> > >> Without this fix, their contents are: > >> > >> File foo content after log replay: > >> 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > >> * > >> 0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb > >> * > >> 0372000 > >> File bar content after log replay: > >> 0000000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee > >> * > >> 0200000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > >> * > >> 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >> * > >> 0772000 > >> > >> A test case submission for fstests follows soon. > >> > >> Signed-off-by: Filipe Manana <[email protected]> > >> --- > >> fs/btrfs/tree-log.c | 108 > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 108 insertions(+) > >> > >> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c > >> index 7ac45cf..ac90336 100644 > >> --- a/fs/btrfs/tree-log.c > >> +++ b/fs/btrfs/tree-log.c > >> @@ -4203,6 +4203,107 @@ static int btrfs_log_all_xattrs(struct > >> btrfs_trans_handle *trans, > >> return 0; > >> } > >> > >> +/* > >> + * If the no holes feature is enabled we need to make sure any hole > >> between the > >> + * last extent and the i_size of our inode is explicitly marked in the > >> log. This > >> + * is to make sure that doing something like: > >> + * > >> + * 1) create file with 128Kb of data > >> + * 2) truncate file to 64Kb > >> + * 3) truncate file to 256Kb > >> + * 4) fsync file > >> + * 5) <crash/power failure> > >> + * 6) mount fs and trigger log replay > >> + * > >> + * Will give us a file with a size of 256Kb, the first 64Kb of data match > >> what > >> + * the file had in its first 64Kb of data at step 1 and the last 192Kb of > >> the > >> + * file correspond to a hole. The presence of explicit holes in a log > >> tree is > >> + * what guarantees that log replay will remove/adjust file extent items > >> in the > >> + * fs/subvol tree. > >> + * > >> + * Here we do not need to care about holes between extents, that is > >> already done > >> + * by copy_items(). We also only need to do this in the full sync path, > >> where we > >> + * lookup for extents from the fs/subvol tree only. In the fast path > >> case, we > >> + * lookup the list of modified extent maps and if any represents a hole, > >> we > >> + * insert a corresponding extent representing a hole in the log tree. > >> + */ > >> +static int btrfs_log_trailing_hole(struct btrfs_trans_handle *trans, > >> + struct btrfs_root *root, > >> + struct inode *inode, > >> + struct btrfs_path *path) > >> +{ > >> + int ret; > >> + struct btrfs_key key; > >> + u64 hole_start; > >> + u64 hole_size; > >> + struct extent_buffer *leaf; > >> + struct btrfs_root *log = root->log_root; > >> + const u64 ino = btrfs_ino(inode); > >> + const u64 i_size = i_size_read(inode); > >> + > >> + if (!btrfs_fs_incompat(root->fs_info, NO_HOLES)) > >> + return 0; > >> + > >> + key.objectid = ino; > >> + key.type = BTRFS_EXTENT_DATA_KEY; > >> + key.offset = (u64)-1; > >> + > >> + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); > >> + ASSERT(ret != 0); > >> + if (ret < 0) > >> + return ret; > >> + > >> + ASSERT(path->slots[0] > 0); > >> + path->slots[0]--; > >> + leaf = path->nodes[0]; > >> + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); > >> + > >> + if (key.objectid != ino || key.type != BTRFS_EXTENT_DATA_KEY) { > >> + /* inode does not have any extents */ > >> + hole_start = 0; > >> + hole_size = i_size; > >> + } else { > >> + struct btrfs_file_extent_item *extent; > >> + u64 len; > >> + > >> + /* > >> + * If there's an extent beyond i_size, an explicit hole was > >> + * already inserted by copy_items(). > >> + */ > >> + if (key.offset >= i_size) > >> + return 0; > >> + > >> + extent = btrfs_item_ptr(leaf, path->slots[0], > >> + struct btrfs_file_extent_item); > >> + > >> + if (btrfs_file_extent_type(leaf, extent) == > >> + BTRFS_FILE_EXTENT_INLINE) { > >> + len = btrfs_file_extent_inline_len(leaf, > >> + path->slots[0], > >> + extent); > >> + ASSERT(len == i_size); > >> + return 0; > >> + } > >> + > >> + len = btrfs_file_extent_num_bytes(leaf, extent); > >> + /* Last extent goes beyond i_size, no need to log a hole. */ > >> + if (key.offset + len > i_size) > >> + return 0; > >> + hole_start = key.offset + len; > >> + hole_size = i_size - hole_start; > >> + } > >> + btrfs_release_path(path); > >> + > >> + /* Last extent ends at i_size. */ > >> + if (hole_size == 0) > >> + return 0; > >> + > >> + hole_size = ALIGN(hole_size, root->sectorsize); > >> + ret = btrfs_insert_file_extent(trans, log, ino, hole_start, 0, 0, > >> + hole_size, 0, hole_size, 0, 0, 0); > >> + return ret; > >> +} > >> + > >> /* log a single inode in the tree log. > >> * At least one parent directory for this inode must exist in the tree > >> * or be logged already. > >> @@ -4466,6 +4567,13 @@ next_slot: > >> err = btrfs_log_all_xattrs(trans, root, inode, path, dst_path); > >> if (err) > >> goto out_unlock; > >> + if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) { > >> + btrfs_release_path(path); > >> + btrfs_release_path(dst_path); > >> + err = btrfs_log_trailing_hole(trans, root, inode, path); > >> + if (err) > >> + goto out_unlock; > >> + } > >> log_extents: > >> btrfs_release_path(path); > >> btrfs_release_path(dst_path); > >> -- > >> 2.1.3 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > >> the body of a message to [email protected] > >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
