On 03/14/2018 08:27 PM, Austin S. Hemmelgarn wrote:
> On 2018-03-14 14:39, Goffredo Baroncelli wrote:
>> On 03/14/2018 01:02 PM, Austin S. Hemmelgarn wrote:
>> [...]
>>>>
>>>> In btrfs, a checksum mismatch creates an -EIO error during the reading. In 
>>>> a conventional filesystem (or a btrfs filesystem w/o datasum) there is no 
>>>> checksum, so this problem doesn't exist.
>>>>
>>>> I am curious how ZFS solves this problem.
>>> It doesn't support disabling COW or the O_DIRECT flag, so it just never has 
>>> the problem in the first place.
>>
>> I would like to perform some tests: however I think that you are right. if 
>> you make a "double buffering" approach (copy the data in the page cache, 
>> compute the checksum, then write the data to disk), the mismatch should not 
>> happen. Of course this is incompatible with O_DIRECT; but disabling O_DIRECT 
>> is a prerequisite to the "double buffering"; alone it couldn't be 
>> sufficient; what about mmap ? Are we sure that this does a double buffering ?
> There's a whole lot of applications that would be showing some pretty serious 
> issues if checksumming didn't work correctly with mmap(), so I think it does 
> work correctly given that we don't have hordes of angry users and sysadmins 
> beating down the doors.

I tried to do in parallel updating a page and writing in different thread; I 
was unable to reproduce a checksum mismatch; so it seems that mmap are safe 
from this point of view;

>>
>> I would prefer that btrfs doesn't allow O_DIRECT with the COW files. I 
>> prefer this to the checksum mismatch bug.
> This is only reasonable if you are writing to the files.  Checksums appear to 
> be checked on O_DIRECT reads, and outside of databases and VM's, read-only 
> access accounts for a significant percentage of O_DIRECT usage, partly 
> because it is needed for AIO support (nginx for example can serve files using 
> AIO and O_DIRECT and gets a pretty serious performance boost on heavily 
> loaded systems by doing so).
> 

So O_DIRECT should be unsupported/ignored only for the writing ? It could be a 
good compromise...

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to