On Sat, Apr 14, 2018 at 08:35:45PM -0500, Vijaychidambaram Velayudhan Pillai wrote: > I was one of the authors on that paper, and I didn't know until today you > didn't like that work :) The paper did *not* suggest we support invented > guarantees without considering the performance impact.
I hadn't noticed that you were one of the authors on that paper, actually. The problem with that paper was I don't think the researchers had talked to anyone who had actually designed production file systems. For example, there are some the hypothetical ext3-fast file system proposed in the paper has some real practical problems. You can't just switch between having the file contents being journaled via the data=journal mode, and file contents being written via the normal page cache mechanisms. If you don't do some very heavy-weight, performance killing special measures, data corruption is a very real possibility. (If you're curious as to why, see the comments in the function ext4_change_journal_flag() in fs/ext4/inode.c, which is called when clearing the per-file data journal flag. We need to stop the journal, write all dirty, journalled buffers to disk, empty the journal, and only then can we switch a file from using data journalling to the normal ordered data mode handling. Now imagine ext3-fast needing to do all of this...) The paper also talked in terms of what file system designers should consider; it didn't really make the same recommendation to application authors. If you look at Table 3(c), which listed application "vulnerabilities" under current file systems, for the applications that do purport to provide robustness against crashes (e.g., Postgres, LMDB, etc.) , most of them actually work quite well, with little or vulerabilities. A notable example is Zookeeper --- but that might be an example where the application is just buggy, and should be fixed. > I don't disagree with any of this. But you can imagine how this can be all > be confusing to file-system developers and research groups who work on file > systems: without formal documentation, what exactly should they test or > support? Clearly current file systems provide more than just POSIX and > therefore POSIX itself is not very useful. I agree that documenting what behavior applications can depend upon is useful. However, this needs to be done as a conversation --- and a negotiation --- between application and file system developers. (And not necessarily just from one operating system, either! Application authors might care about whether they can get robustness guarantees on other operationg systems, such as Mac OS X.) Also, the tradeoffs may in some cases probabilities of data loss, and not hard guarantees. Formal documentation also takes a lot of effort to write. That's probably why no one has tried to formally codify it since POSIX. We do have informal agreements, such as adding an implied data flush after certain close or renames operations. And sometimes these are written up, but only informally. A good example of this is the O_PONIES controversy, wherein the negotiations/conversation happened on various blog entries, and ultimately at an LSF/MM face-to-face meeting: http://blahg.josefsipek.net/?p=364 https://sandeen.net/wordpress/uncategorized/coming-clean-on-o_ponies/ https://lwn.net/Articles/322823/ https://lwn.net/Articles/327601/ https://lwn.net/Articles/351422/ Note that the implied file writebacks after certain renames and closes (as documented at the end of https://lwn.net/Articles/322823/) was implemented for ext4, and then after discussion at LSF/MM, there was general agreement across multiple major file system maintainers that we should all provide similar behavior. So doing this kind of standardization, especially if you want to take into account all of the stakeholders, takes time and is not easy. If you only take one point of view, you can have what happened with the C standard, where the room was packed with compiler authors, who were only interested in what kind of cool compiler optimizations they could do, and completely ignored whether the resulting standard would actually be useful by practicing system programmers. Which is why the Linux kernel is only really supported on gcc, and then with certain optimizations allowed by the C standard explicitly turned off. (Clang support is almost there, but not everyone trust a kernel built by Clang won't have some subtle, hard-to-debug problems...) Academics could very well have a place in helping to facilitate the conversation. I think my primary concern with the Pillai paper is that the authors apparently talked a whole bunch to application authors, but not nearly as much to file system developers. > But in any case, coming back to our main question, the conclusion seems to > be: symlinks aren't standard, so we shouldn't be studying their > crash-consistency properties. This is useful to know. Thanks! Well, symlinks are standardized. But what the standards say about them is extremely limited. And the crash-consistency properties you were looking at, which is what fsync() being called on a file descriptor opened via a symlink, is definitely not consistent with either the Posix/SUS standard, or historical practice by BSD and other Unix systems, as well as Linux. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html