On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote:
> Thanks! As I mentioned before, this is useful. I have a follow-up
> question. Consider the following workload:
> creat foo
> link (foo, A/bar)
> In this case, after the file system recovers, do we expect foo's link
> count to be 2 or 1?
So, strictly ordered behaviour:
- creates dirent in inode B and new inode A in an atomic
transaction sequence #1
link foo -> A/bar
- creates dirent in inode C and bumps inode A link count in
an atomic transaction seqeunce #2.
- looks at inode A, sees it's "last modification" sequence
counter as #2
- flushes all transactions up to and including #2 to the
See the dependency chain? Both the inodes and dirents in the create
operation and the link operation are chained to the inode foo via
the atomic transactions. Hence when we flush foo, we also flush the
dependent changes because of the change atomicity requirements....
> I would say 2,
Correct, for strict ordering. But....
> but POSIX is silent on this,
Well, it's not silent, POSIX explicitly allows for fsync() to do
nothing and report success. Hence we can't really look to POSIX to
define how fsync() should behave.
> thought I would confirm. The tricky part here is we are not calling
> fsync() on directory A.
Right. But directory A has a dependent change linked to foo. If we
fsync() foo, we are persisting the link count change in that file,
and hence all the other changes related to that link count change
must also be flushed. Similarly, all the cahnges related to the
creation on foo must be flushed, too.
> In this case, its not a symlink; its a hard link, so I would say the
> link count for foo should be 2.
Right - that's the "reference counted object dependency" I refered
to. i.e. it's a bi-direction atomic dependency - either we show both
the new dirent and the link count change, or we show neither of
them. Hence fsync on one object implies that we are also persisting
the related changes in the other object, too.
> But btrfs and F2FS show link count of
> 1 after a crash.
That may be valid if the dirent A/bar does not exist after recovery,
but it also means fsync() hasn't actually guaranteed inode changes
made prior to the fsync to be persistent on disk. i.e. that's a
violation of ordered metadata semantics and probably a bug.
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html