On Fri, Oct 28, 2016 at 10:04:12AM +0200, Oswald Buddenhagen wrote: > > As for truncation, this might still happen if the file is not fsynced > > explicitly at critical transaction points (including before fclose). > > > you're not getting truncation, but data corruption, as that's what > appending a number of null bytes is. thers is _no_ standard that permits > this without an interim system crash, fsync or not.
Actually, without an fsync ***anything*** goes. In particular, if you append to a file, and the system allocates a new block, it's fair game for the file system to attach a block to the disk, but mark the block the as uninitalized, so that reads to that block results in zeros. That's not technically data corruption. All of the data up to the last fsync is safe. What happens after the last fsync is up in the air. The behavior I described is what XFS will do. With ext4, we use delayed allocation, but the way we do data=ordered is that we flush the data blocks *before* we do the commit, so in practice it shouldn't be happening with ext4. However, we reserve the right to switch how we do things in the future to be more like XFS, since there are some performance advantages for not forcing out the data block, but just marking the block as uninitalized and then marking the block as initialized after the writeback completes. If you mount with the data=writeback flag, then we don't force out data blocks before we do a commit (which gives a performance advantage, which is why some users might choose to use it), but it means that it's possible for stale data (the previous contents of the data block) to become revealed after a crash. But (and this is important) it's completelly legal as far as the POSIX standard is concerned. So if you care about this, I would strongly recommend that you include a CRC of the contents of the transaction blocks in the commit record. Also note that technically speaking, although fsync() guarantees that after it returns, everything written is committed to stable store, it does not guarantee about the *order* that data will be commited to stable store before the fsync() completes. So if you want to be technically correct, what you need to do is either (a) write the transaction blocks, fsync, then write the commit record, and then fsync a second time, or (b) write the transaction blocks, and write the commit block with a CRC, and then fsync --- and then on the replay, check the CRC in the commit block, and if the CRC does not check out, discard the last transaction since it wasn't fully committed to stable store before the crash. (Yes, storage is hard. The reason why it's hard is because users insist on extreme performance, and so POSIX guarantes are fairly loose. They have to be, or every day performance would be horrific. What this does mean is that if you want transaction / atomic guarantees, you have all of the low-level tools, but it's up to the application programmer or the database library implementor to use those tools corretly.) Best regards, - Ted ------------------------------------------------------------------------------ The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik _______________________________________________ isync-devel mailing list isync-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/isync-devel