Raul,

What do you mean?


What I wanted to say here has been that with respect to data safety, there are two classes of filesystems around, and that is those with a whole-disk hash on the one hand (ZFS, and I think maybe btrfs and Hammer2), and those who don't on the other.


I agree with you that "in practise, how do filesystems break" is an essential question to be asked, as that question impies "in practise, how do filesystems lose their data safety".

But also, that question is analogous to "is it worthwhile to download and check the SHA256/MD5 hash separately when downloading a file from the web".


In comparison, Karel's RAID1C in its present form would be like downloading the file twice, and per-block CRC32 hashes twice, and then comparing both copies to know you got the right thing.

That's nice as it provides some automatic healing, but, that has a limitation in the extra space used, and yet it's not safe to misdirected writes, not even across the time that it's mounted continuously.

Just hashing the whole disk (and also keeping that hash in RAM fort he whole period that it's in use) seems like a pretty inexpensive and "lean and mean" way to data safety guarantees to me.


We do know that what is happening is that disks do fail in all kinds of ways, some less and some more incredible, we do see that ordinary filesystems not would detect misdirected writes at the location where, and the question I wanted to pose by this conversation was how to maximize data safety -

Sorry for kind of pushing a particular way of thinking here, but, to some extent this is an algorithmical conversation where the exact way physical disks fail predominantly does not matter.

I agree that how widely it's worth to use this kind of hashing is an interesting question, both in understanding what overhead it implies performance-wise, and how frequently its unique safety benefits actually are of practical value -

I guess maybe the only way to get that answered would be by actually implementing it, and then maybe implementing also a routine to detect when it was uniquely beneficial to find a fault, as that can be easily detected (complementing sysctl diskhashing.detected_breach with a sysctl diskhashing.was_i_uniquely_needed).

This way, the performance overhead can be evaluated over ordinary non-hashed FS by ordinary IO tests, and its practical use can be done by users by monitoring the two sysctl:s and measuring how often diskhashing.was_i_uniquely_needed is set when diskhashing.detected_breach is set.


And finally of course an important question is exactly how the disk hashing scheme would be implemented best, and how disks break in practice would be central in answering that. But, at least as for me, if I just know there's strong hashing (and I can get a copy of the disk's total hash at unmount and mount time), I trust that enough and that's all I need -

I just want a catch-all data safety mechanism that safeguards against every type of disk breakdown, that's all.


What do you say about this way of reasoning?


Thanks,
Tinker


https://en.wikipedia.org/wiki/Btrfs#Checksum_tree_and_scrubbing

https://en.wikipedia.org/wiki/HAMMER



On 2015-12-02 10:17, Raul Miller wrote:
This gives essentially zero information with which to compare the
relative failure rates between file system implementations.
..

But I guess it's good to hear how you would be happy?

Thanks,

Reply via email to