On Thu, Jun 25, 2015 at 3:20 PM, Liu Bo <bo.li....@oracle.com> wrote: > On Thu, Jun 25, 2015 at 04:17:46AM +0100, fdman...@kernel.org wrote: >> From: Filipe Manana <fdman...@suse.com> >> >> When we have the no_holes feature enabled, if a we truncate a file to a >> smaller size, truncate it again but to a size greater than or equals to >> its original size and fsync it, the log tree will not have any information >> about the hole covering the range [truncate_1_offset, new_file_size[. >> Which means if the fsync log is replayed, the file will remain with the >> state it had before both truncate operations. > > Does the fs/subvol tree get updated to the right information at this > time?
No, and that's the problem. Because no file extent items are stored in the log tree. The inode item is updated with the new i_size however (as expected). thanks > > Thanks, > > -liubo >> >> Without the no_holes feature this does not happen, since when the inode >> is logged (full sync flag is set) it will find in the fs/subvol tree a >> leaf with a generation matching the current transaction id that has an >> explicit extent item representing the hole. >> >> Fix this by adding an explicit extent item representing a hole between >> the last extent and the inode's i_size if we are doing a full sync. >> >> The issue is easy to reproduce with the following test case for fstests: >> >> . ./common/rc >> . ./common/filter >> . ./common/dmflakey >> >> _need_to_be_root >> _supported_fs generic >> _supported_os Linux >> _require_scratch >> _require_dm_flakey >> >> # This test was motivated by an issue found in btrfs when the btrfs >> # no-holes feature is enabled (introduced in kernel 3.14). So enable >> # the feature if the fs being tested is btrfs. >> if [ $FSTYP == "btrfs" ]; then >> _require_btrfs_fs_feature "no_holes" >> _require_btrfs_mkfs_feature "no-holes" >> MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes" >> fi >> >> rm -f $seqres.full >> >> _scratch_mkfs >>$seqres.full 2>&1 >> _init_flakey >> _mount_flakey >> >> # Create our test files and make sure everything is durably persisted. >> $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 64K" \ >> -c "pwrite -S 0xbb 64K 61K" \ >> $SCRATCH_MNT/foo | _filter_xfs_io >> $XFS_IO_PROG -f -c "pwrite -S 0xee 0 64K" \ >> -c "pwrite -S 0xff 64K 61K" \ >> $SCRATCH_MNT/bar | _filter_xfs_io >> sync >> >> # Now truncate our file foo to a smaller size (64Kb) and then truncate >> # it to the size it had before the shrinking truncate (125Kb). Then >> # fsync our file. If a power failure happens after the fsync, we expect >> # our file to have a size of 125Kb, with the first 64Kb of data having >> # the value 0xaa and the second 61Kb of data having the value 0x00. >> $XFS_IO_PROG -c "truncate 64K" \ >> -c "truncate 125K" \ >> -c "fsync" \ >> $SCRATCH_MNT/foo >> >> # Do something similar to our file bar, but the first truncation sets >> # the file size to 0 and the second truncation expands the size to the >> # double of what it was initially. >> $XFS_IO_PROG -c "truncate 0" \ >> -c "truncate 253K" \ >> -c "fsync" \ >> $SCRATCH_MNT/bar >> >> _load_flakey_table $FLAKEY_DROP_WRITES >> _unmount_flakey >> >> # Allow writes again, mount to trigger log replay and validate file >> # contents. >> _load_flakey_table $FLAKEY_ALLOW_WRITES >> _mount_flakey >> >> # We expect foo to have a size of 125Kb, the first 64Kb of data all >> # having the value 0xaa and the remaining 61Kb to be a hole (all bytes >> # with value 0x00). >> echo "File foo content after log replay:" >> od -t x1 $SCRATCH_MNT/foo >> >> # We expect bar to have a size of 253Kb and no extents (any byte read >> # from bar has the value 0x00). >> echo "File bar content after log replay:" >> od -t x1 $SCRATCH_MNT/bar >> >> status=0 >> exit >> >> The expected file contents in the golden output are: >> >> File foo content after log replay: >> 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa >> * >> 0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0372000 >> File bar content after log replay: >> 0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0772000 >> >> Without this fix, their contents are: >> >> File foo content after log replay: >> 0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa >> * >> 0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb >> * >> 0372000 >> File bar content after log replay: >> 0000000 ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee >> * >> 0200000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >> * >> 0372000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> * >> 0772000 >> >> A test case submission for fstests follows soon. >> >> Signed-off-by: Filipe Manana <fdman...@suse.com> >> --- >> fs/btrfs/tree-log.c | 108 >> ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 108 insertions(+) >> >> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c >> index 7ac45cf..ac90336 100644 >> --- a/fs/btrfs/tree-log.c >> +++ b/fs/btrfs/tree-log.c >> @@ -4203,6 +4203,107 @@ static int btrfs_log_all_xattrs(struct >> btrfs_trans_handle *trans, >> return 0; >> } >> >> +/* >> + * If the no holes feature is enabled we need to make sure any hole between >> the >> + * last extent and the i_size of our inode is explicitly marked in the log. >> This >> + * is to make sure that doing something like: >> + * >> + * 1) create file with 128Kb of data >> + * 2) truncate file to 64Kb >> + * 3) truncate file to 256Kb >> + * 4) fsync file >> + * 5) <crash/power failure> >> + * 6) mount fs and trigger log replay >> + * >> + * Will give us a file with a size of 256Kb, the first 64Kb of data match >> what >> + * the file had in its first 64Kb of data at step 1 and the last 192Kb of >> the >> + * file correspond to a hole. The presence of explicit holes in a log tree >> is >> + * what guarantees that log replay will remove/adjust file extent items in >> the >> + * fs/subvol tree. >> + * >> + * Here we do not need to care about holes between extents, that is already >> done >> + * by copy_items(). We also only need to do this in the full sync path, >> where we >> + * lookup for extents from the fs/subvol tree only. In the fast path case, >> we >> + * lookup the list of modified extent maps and if any represents a hole, we >> + * insert a corresponding extent representing a hole in the log tree. >> + */ >> +static int btrfs_log_trailing_hole(struct btrfs_trans_handle *trans, >> + struct btrfs_root *root, >> + struct inode *inode, >> + struct btrfs_path *path) >> +{ >> + int ret; >> + struct btrfs_key key; >> + u64 hole_start; >> + u64 hole_size; >> + struct extent_buffer *leaf; >> + struct btrfs_root *log = root->log_root; >> + const u64 ino = btrfs_ino(inode); >> + const u64 i_size = i_size_read(inode); >> + >> + if (!btrfs_fs_incompat(root->fs_info, NO_HOLES)) >> + return 0; >> + >> + key.objectid = ino; >> + key.type = BTRFS_EXTENT_DATA_KEY; >> + key.offset = (u64)-1; >> + >> + ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); >> + ASSERT(ret != 0); >> + if (ret < 0) >> + return ret; >> + >> + ASSERT(path->slots[0] > 0); >> + path->slots[0]--; >> + leaf = path->nodes[0]; >> + btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); >> + >> + if (key.objectid != ino || key.type != BTRFS_EXTENT_DATA_KEY) { >> + /* inode does not have any extents */ >> + hole_start = 0; >> + hole_size = i_size; >> + } else { >> + struct btrfs_file_extent_item *extent; >> + u64 len; >> + >> + /* >> + * If there's an extent beyond i_size, an explicit hole was >> + * already inserted by copy_items(). >> + */ >> + if (key.offset >= i_size) >> + return 0; >> + >> + extent = btrfs_item_ptr(leaf, path->slots[0], >> + struct btrfs_file_extent_item); >> + >> + if (btrfs_file_extent_type(leaf, extent) == >> + BTRFS_FILE_EXTENT_INLINE) { >> + len = btrfs_file_extent_inline_len(leaf, >> + path->slots[0], >> + extent); >> + ASSERT(len == i_size); >> + return 0; >> + } >> + >> + len = btrfs_file_extent_num_bytes(leaf, extent); >> + /* Last extent goes beyond i_size, no need to log a hole. */ >> + if (key.offset + len > i_size) >> + return 0; >> + hole_start = key.offset + len; >> + hole_size = i_size - hole_start; >> + } >> + btrfs_release_path(path); >> + >> + /* Last extent ends at i_size. */ >> + if (hole_size == 0) >> + return 0; >> + >> + hole_size = ALIGN(hole_size, root->sectorsize); >> + ret = btrfs_insert_file_extent(trans, log, ino, hole_start, 0, 0, >> + hole_size, 0, hole_size, 0, 0, 0); >> + return ret; >> +} >> + >> /* log a single inode in the tree log. >> * At least one parent directory for this inode must exist in the tree >> * or be logged already. >> @@ -4466,6 +4567,13 @@ next_slot: >> err = btrfs_log_all_xattrs(trans, root, inode, path, dst_path); >> if (err) >> goto out_unlock; >> + if (max_key.type >= BTRFS_EXTENT_DATA_KEY && !fast_search) { >> + btrfs_release_path(path); >> + btrfs_release_path(dst_path); >> + err = btrfs_log_trailing_hole(trans, root, inode, path); >> + if (err) >> + goto out_unlock; >> + } >> log_extents: >> btrfs_release_path(path); >> btrfs_release_path(dst_path); >> -- >> 2.1.3 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html