On 03/14/2018 08:27 PM, Austin S. Hemmelgarn wrote: > On 2018-03-14 14:39, Goffredo Baroncelli wrote: >> On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote: >> [...] >>>> >>>> In btrfs, a checksum mismatch creates an -EIO error during the reading. In >>>> a conventional filesystem (or a btrfs filesystem w/o datasum) there is no >>>> checksum, so this problem doesn't exist. >>>> >>>> I am curious how ZFS solves this problem. >>> It doesn't support disabling COW or the O_DIRECT flag, so it just never has >>> the problem in the first place. >> >> I would like to perform some tests: however I think that you are right. if >> you make a "double buffering" approach (copy the data in the page cache, >> compute the checksum, then write the data to disk), the mismatch should not >> happen. Of course this is incompatible with O_DIRECT; but disabling O_DIRECT >> is a prerequisite to the "double buffering"; alone it couldn't be >> sufficient; what about mmap ? Are we sure that this does a double buffering ? > There's a whole lot of applications that would be showing some pretty serious > issues if checksumming didn't work correctly with mmap(), so I think it does > work correctly given that we don't have hordes of angry users and sysadmins > beating down the doors.
I tried to do in parallel updating a page and writing in different thread; I was unable to reproduce a checksum mismatch; so it seems that mmap are safe from this point of view; >> >> I would prefer that btrfs doesn't allow O_DIRECT with the COW files. I >> prefer this to the checksum mismatch bug. > This is only reasonable if you are writing to the files. Checksums appear to > be checked on O_DIRECT reads, and outside of databases and VM's, read-only > access accounts for a significant percentage of O_DIRECT usage, partly > because it is needed for AIO support (nginx for example can serve files using > AIO and O_DIRECT and gets a pretty serious performance boost on heavily > loaded systems by doing so). > So O_DIRECT should be unsupported/ignored only for the writing ? It could be a good compromise... BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
