On 1/27/21 5:34 AM, fdman...@kernel.org wrote:
From: Filipe Manana <fdman...@suse.com>

Whenever we fsync an inode, if it is a directory, a regular file that was
created in the current transaction or has last_unlink_trans set to the
generation of the current transaction, we check if any of its ancestor
inodes (and the inode itself if it is a directory) can not be logged and
need a fallback to a full transaction commit - if so, we return with a
value of 1 in order to fallback to a transaction commit.

However we often do not need to fallback to a transaction commit because:

1) The ancestor inode is not an immediate parent, and therefore there is
    not an explicit request to log it and it is not needed neither to
    guarantee the consistency of the inode originally asked to be logged
    (fsynced) nor its immediate parent;

2) The ancestor inode was already logged before, in which case any link,
    unlink or rename operation updates the log as needed.

So for these two cases we can avoid an unnecessary transaction commit.
Therefore remove check_parent_dirs_for_sync() and add a check at the top
of btrfs_log_inode() to make us fallback immediately to a transaction
commit when we are logging a directory inode that can not be logged and
needs a full transaction commit. All we need to protect is the case where
after renaming a file someone fsyncs only the old directory, which would
result is losing the renamed file after a log replay.

This patch is part of a patchset comprised of the following patches:

   btrfs: remove unnecessary directory inode item update when deleting dir entry
   btrfs: stop setting nbytes when filling inode item for logging
   btrfs: avoid logging new ancestor inodes when logging new inode
   btrfs: skip logging directories already logged when logging all parents
   btrfs: skip logging inodes already logged when logging new entries
   btrfs: remove unnecessary check_parent_dirs_for_sync()
   btrfs: make concurrent fsyncs wait less when waiting for a transaction commit

Performance results, after applying all patches, are mentioned in the
change log of the last patch.

Signed-off-by: Filipe Manana <fdman...@suse.com>

I'm having a hard time with this one.

Previously we would commit the transaction if the inode was a regular file, that was created in this current transaction, and had been renamed. Now with this patch you're only committing the transaction if we are a directory and were renamed ourselves. Before if you already had directories A and B and then did something like

echo "foo" > /mnt/test/A/blah
fsync(/mnt/test/A/blah);
fsync(/mnt/test/A);
mv /mnt/test/A/blah /mnt/test/B
fsync(/mnt/test/B/blah);

we would commit the transaction on this second fsync, but with your patch we are not. I suppose that's keeping in line with how fsync is allowed to work, but it's definitely a change in behavior from what we used to do. Not sure if that's good or not, I'll have to think about it. Thanks,

Josef

Reply via email to