On Wed, Nov 28, 2018 at 02:54:28PM +0000, fdman...@kernel.org wrote: > From: Filipe Manana <fdman...@suse.com> > > The log tree has a long standing problem that when a file is fsync'ed we > only check for new ancestors, created in the current transaction, by > following only the hard link for which the fsync was issued. We follow the > ancestors using the VFS' dget_parent() API. This means that if we create a > new link for a file in a directory that is new (or in an any other new > ancestor directory) and then fsync the file using an old hard link, we end > up not logging the new ancestor, and on log replay that new hard link and > ancestor do not exist. In some cases, involving renames, the file will not > exist at all. > > Example: > > mkfs.btrfs -f /dev/sdb > mount /dev/sdb /mnt > > mkdir /mnt/A > touch /mnt/foo > ln /mnt/foo /mnt/A/bar > xfs_io -c fsync /mnt/foo > > <power failure> > > In this example after log replay only the hard link named 'foo' exists > and directory A does not exist, which is unexpected. In other major linux > filesystems, such as ext4, xfs and f2fs for example, both hard links exist > and so does directory A after mounting again the filesystem. > > Checking if any new ancestors are new and need to be logged was added in > 2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"), > however only for the ancestors of the hard link (dentry) for which the > fsync was issued, instead of checking for all ancestors for all of the > inode's hard links. > > So fix this by tracking the id of the last transaction where a hard link > was created for an inode and then on fsync fallback to a full transaction > commit when an inode has more than one hard link and at least one new hard > link was created in the current transaction. This is the simplest solution > since this is not a common use case (adding frequently hard links for > which there's an ancestor created in the current transaction and then > fsync the file). In case it ever becomes a common use case, a solution > that consists of iterating the fs/subvol btree for each hard link and > check if any ancestor is new, could be implemented. > > This solves many unexpected scenarios reported by Jayashree Mohan and > Vijay Chidambaram, and for which there is a new test case for fstests > under review. > > Reported-by: Vijay Chidambaram <vvija...@gmail.com> > Reported-by: Jayashree Mohan <jayashree2...@gmail.com> > Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes") > Signed-off-by: Filipe Manana <fdman...@suse.com>
Added to misc-next, thanks.