Re: Symlink not persisted even after fsync

Dave Chinner Sat, 14 Apr 2018 14:56:24 -0700

On Fri, Apr 13, 2018 at 10:27:56PM -0500, Vijay Chidambaram wrote:
> Hi Dave,
> 
> Thanks for the reply.
> 
> I feel like we are not talking about the same thing here.
> 
> What we are asking is: if you perform
> 
> fsync(symlink)
> crash
> 
> can we expect it to see the symlink file in the parent directory after
> a crash given we didn't fsync the parent directory? Amir argues we
> can't expect it. Your first email seemed to argue we should expect it.


My first email comments on Amir's quoting of behaviours for files vs
directories on fsync, and then applying those caveats to symlinks.
It probably wasn't that clear I was mainly trying to point out that
symlinks are not files, so they have different ordering
requirements. i.e. that you have to look at ordering requirements of
the filesystems, not the fsync() specification to determine what the
fsync behviour is supposed to be.

My second email clarifies the ordering behaviour that is expected
with symlinks and the reason why you'll see different behaviour to
files w.r.t. fsync and parent directories.

> ext4 and xfs have this behavior, which Amir argues is an
> implementation side-effect, and not intended.
> 
> >> >>> 1. symlink (foo, bar.tmp)
> >> >>> 2. open bar.tmp
> >> >>> 3. fsync bar.tmp
> >> >>> 4. rename(bar.tmp, bar)
> >> >>> 5. fsync bar
> >> >>> ----crash here----
> 
> The second workload that Amir constructed just moves the symlink
> creation into a different transaction. In both workloads, we are
> creating or renaming new symlinks and calling fsync on them. In both
> cases we are not explicitly calling fsync on the parent directory.

Yes, I decided not to write all this "symlink behaviour is dependent
on initial conditions" stuff because, AFAIC, it is a pretty obvious
conclusion to draw from the ordering dependencies I described
between the symlink and the object it points at.

Script that demonstrates this is simple:

$ cat t.sh
#!/bin/bash

dev=/dev/vdb
mnt=/mnt/scratch
test_file=$mnt/foo

# 1. symlink (foo, bar.tmp)
# 2. open bar.tmp
# 3. fsync bar.tmp
# 4. rename(bar.tmp, bar)
# 5. fsync bar

umount $mnt
mount $dev $mnt

cd $mnt
rm -f foo bar.tmp bar
sync

# Don't fsync creation of foo, will see foo and bar.tmp after shutdown
touch foo
ln -s foo bar.tmp
xfs_io -c fsync bar.tmp
mv bar.tmp bar
xfs_io -c fsync bar
xfs_io -xc "shutdown" $mnt

cd ~
umount $mnt
mount $dev $mnt
cd $mnt
ls -l $mnt
rm -f foo bar.tmp bar
sync

# don't fsync foo or bar.tmp, will see foo and bar after shutdown
touch foo
xfs_io -c fsync foo

touch foo
ln -s foo bar.tmp
mv bar.tmp bar
xfs_io -c fsync bar
xfs_io -xc "shutdown" $mnt


cd ~
umount $mnt
mount $dev $mnt
cd $mnt
ls -l $mnt
rm -f foo bar.tmp bar
sync

# fsync creation of foo, will see only foo after shutdown
touch foo
xfs_io -c fsync foo

ln -s foo bar.tmp
xfs_io -c fsync bar.tmp
mv bar.tmp bar
xfs_io -c fsync bar
xfs_io -xc "shutdown" $mnt

cd ~
umount $mnt
mount $dev $mnt
cd $mnt
ls -l $mnt
$

And the output is:

$ sudo umount /mnt/scratch ; sudo mount /dev/vdb /mnt/scratch ; sudo ./t.sh ;
total 0
lrwxrwxrwx. 1 root root 3 Apr 14 09:52 bar.tmp -> foo
-rw-r--r--. 1 root root 0 Apr 14 09:52 foo
total 0
lrwxrwxrwx. 1 root root 3 Apr 14 09:52 bar -> foo
-rw-r--r--. 1 root root 0 Apr 14 09:52 foo
total 0
-rw-r--r--. 1 root root 0 Apr 14 09:52 foo
$

i.e. it depends on the state of the original file as to what is
captured by the fsync of that file through the symlink. i.e.
symlinks has no ordering dependency with the object resolved from
the path in the symlink.


> Note that we are not saying if we call fsync on symlink file, it
> should call fsync on the original file. We agree that should not be
> done as the symlink file and the original link are two distinct
> entities.

"symlink file" - there's no such thing. It's either a symlink or a
regular file and it cant be both. 

And, well, you can't fsync a symlink *inode*, anyway, because you
can't open it directly for IO operations.

> I believe in most journaling/copy-on-write file systems today, if you
> call fsync on a new file, the fsync will persist the directory entry
> of the new file in the parent directory (even though POSIX doesn't
> really require this).

Yes, that's the strict ordering dependency thing I talked about, and
it was something that btrfs got wrong for an awful long time.

> It seems reasonable to extend this persistence
> courtesy to symlinks (considering them just as normal files).

And no, that's not reasonable, because symlinks only contain a path
instead of a direct reference to any filesysetm object. i.e. it's an
indirect reference, and that can be clearly seen by the fact that
Symlinks are created and removed without referencing the object they
point to or caring whether it is even valid.

There is no way reliable ordering dependencies can be created for
indirect references, especially as symlinks can point to any type of
object (e.g. dir, blkdev, etc), it can point to something outside
the filesystem, and it can even point to something that doesn't
exist.

This also means that "fsync on a symlink" may, in fact, run a fsync
method of a completely different filesystem or subsystem. There is
no way this could possible trigger a directory fsync of the symlink
parent, because the object being fsync()d may not even know what a
filesystem is...

If you want a symlink to have ordering behaviour like a dirent
pointing to a regular file, then use hard links....

Cheers,

Dave.
-- 
Dave Chinner
[email protected]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Symlink not persisted even after fsync

Reply via email to