On Mon, Apr 16, 2018 at 12:52 AM, Theodore Y. Ts'o <ty...@mit.edu> wrote:
> On Sun, Apr 15, 2018 at 07:10:52PM -0500, Vijay Chidambaram wrote:
>> I don't think this is what the paper's ext3-fast does. All the paper
>> says is if you have a file system where the fsync of a file persisted
>> only data related to that file, it would increase performance.
>> ext3-fast is the name given to such a file system. Note that we do not
>> present a design of ext3-fast or analyze it in any detail. In fact, we
>> explicitly say "The ext3-fast file system (derived from inferences
>> provided by ALICE) seems interesting for application safety, though
>> further investigation is required into the validity of its design."
> Well, says that it's based on ext3's data=journal "Abstract Persistent
> Model". It's true that a design was not proposed --- but if you
> don't propose a design, how do you know what the performance is or
> whether it's even practical? That's one of those things I find
> extremely distasteful in the paper. Sure, I can model a faster than
> light interstellar engine ala Star Trek's Warp Drive --- and I can
> talk about it having, say, better performance than a reaction drive.
> But it doesn't tell us anything useful about whether it can be built,
> or whether it's even useful to dream about it.
> To me, that part of the paper, really read as, "watch as I wave my
> hands around widely, that they never leave the ends of my arms!"
I partially understand where you are coming from, but your argument
seems to boil down to "don't say anything until you have worked out
every detail". I don't agree with this. Yes, it was speculative, but
we did have a fairly clear disclaimer.
To the point about it being obvious: you might be surprised at how
many people outside this community take it for granted that if you
fsync a file, only that file's contents and metadata will be persisted
:) So it was obvious to you, but truly shocking for many.
Btw, ext3-fast is what led to our CCFS work in FAST 17:
http://www.cs.utexas.edu/~vijay/papers/fast17-c2fs.pdf. In this paper,
we do show that if you divide your application writes into streams, it
is possible to persist only the data/metadata of one stream,
independent of the IO being done in other streams. So as it turned
out, it wasn't an impossible file-system design.
But we digress. I think we both agree that researchers should engage
more with the file-system community.
>> Thanks! As I mentioned before, this is useful. I have a follow-up
>> question. Consider the following workload:
>> creat foo
>> link (foo, A/bar)
>> In this case, after the file system recovers, do we expect foo's link
>> count to be 2 or 1? I would say 2, but POSIX is silent on this, so
>> thought I would confirm. The tricky part here is we are not calling
>> fsync() on directory A.
>> In this case, its not a symlink; its a hard link, so I would say the
>> link count for foo should be 2. But btrfs and F2FS show link count of
>> 1 after a crash.
> Well, is the link count accurate? That is to say, does A/bar exist?
> I would think that the requirement that the file system be self
> consistent is the most important consideration.
There are two ways to look at this.
1. A/bar does not exist, link count is 1, and so it is not a bug.
2. We are calling fsync on the inode when the inode's link count is 2.
So it should persist the inode plus the dependency that is A/bar. The
file system after a crash should show both A/bar and the file with
link count 2. This is what ext4, xfs, and F2FS do.
We've posted separately to figure out what semantics btrfs supports.
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Linux-f2fs-devel mailing list