On 29 Apr 2018, at 18:16, Theodore Y. Ts'o wrote:

On Sun, Apr 29, 2018 at 03:55:39PM -0500, Vijay Chidambaram wrote:
In the spirit of clarifying fsync behavior, we have one more case
where we'd like to find out what should be expected.

Consider this:

Mkdir A
Creat A/bar
Fsync A/bar
Rename A to B
Fsync B/bar
-- Crash --

A/bar has been fsynced previously, so its not a newly created file.
After the crash, in ext4 and btrfs, can we expect directory B and
B/bar to exist?

or ext4, no.  The POSIX semantics apply: bar will *either* be in A,
or in B.

Same for btrfs. If the rename for B goes down, it'll be a side effect of other decisions and not on purpose. I'd actually like for the rename to be on disk in the normal case, but we won't always be able to catch it.


If you modify the file bar such that the mod time has been updated,
then fsync(2) --- but not necessarily fdatasync(2) --- will cause the
inode modifications to be written committed, and this will cause the
updates to directory B from the rename to be committed as a
side-effect.

Note though that there are plenty of people who consider this to be a
performance bug, and not a feature, and there have been papers
proposed by your fellow academics that if implemented, would change
this to no longer be true.

In general with these sorts of things it would be useful to reason
about this in the context of real world applications and why they want
such guarantees.  These guarantees can cost performance hits, and so
there is a cost/benefit tradeoff involved.  So my preference is to
negotiate with applicationt writes, and ask *why* they want such
guarantees, and to explore whether there better ways of achieving
their high level goals before we legislate this to be an iron-clad
commitment which might application A happy, but performance-seeking
user B unhappy.

I know this is not POSIX compliant, but from prior comments, it seems
like both ext4 and btrfs would like to persist directory entries upon
fsync of newly created files. So we were wondering if this extended to
this case.

We had real world examples of users/applications who suffered data
loss when the directory entries for newly created files were not
persisted.  It was on the basis of these complaints that we made this
commitment, since it seemed more important than the relatively minor
performance hit.


Agreeing with Ted and expanding a bit. If fsync(some file) doesn't persist the name for that file, applications need to fsync the directories, which can be double the log commits. Getting everything down to disk in one fsync() is much better for both the application and the FS.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to