Hi Karel,
Glad to talk to you.
Why the extra IO expense?
About the Fletcher vs not Fletcher thing, can you please explain to me
what happens in a setup where I have one single disk with one single
RAID partition on it using your disciple, and..
1) I write a sector/block on some position X
2) My disk's allocation table gets messed up so it's moved to another
random position Y
3) I read sector/block on position Y
4) Also I read sector/block on position X
Maybe an advantage with the Fletcher thing is that, as I understood it,
it's in a way like a "tree-ed checksum" structure so the disk has a
"root checksum" that's for all the disk, which is updated at write time
too i.e. 1) (and any hash tree levels between the root and the position
X which is written).
This means that not only would 3) here report failure, but also 4),
which is perfect, i.e. the Fletcher thing would catch *any*
inconsistency anywhere on the disk.
Maybe it could be argued that it's "too picky" for some less data-safe
environments, but, in a place where you have good backups and you value
100.0% fread() correctness, it's awesome!!
Looking forward to your response, thanks!
Tinker
On 2015-12-02 03:22, Karel Gardas wrote:
I don't know about fletcher, but I'm working on crc32 based
checksumming for soft raid1. The basic implementation is ready but I'm
not satisfied with write performance in some cases: small files, lots
of collisions in chksumming blocks etc. Worst case I see 6-7x slower
performance here in comparison with plain RAID1. I've tried to make
that situation better with the chksumming blocks cache on which I've
been working last few weekends, but still this is not right and while
using 32k blocks fs the improvements are not worth the much higher
complexity of the code, so I'll probably switch to scrub hacking which
is something you usually need in case of chksumming anyway. :-)
On the bright side: code "self-heal" bad block happily and refuse to
push you bad data in case of errors on all chunks. Also due to
simplicity of design if something runs really badly you still can
detach drive and attach it as a plain RAID1 and get your data out.
W.r.t. performance read is on 70-80% of plain RAID1 and write of big
data (>=32k on 32k block fs) is about 70% of plain RAID1. Also
PostgreSQL pgbench is about 70% of speed of RAID1 (again on 32k block
fs). Just small files sucks. W.r.t. fletcher, I think we don't need it
and still will be able to detect moved block. That's due to layout
which is really simple: <SR metadata><data area><chksum area>.
Are you willing to test the code on your setup? If so, I can save the
patch somewhere for you but well, my tree is month old or so if you
don't mind...
PS: all performance figures got on haswell based server with 2 WD Re
512 bytes sector (physical size) drives. So your numbers may vary and
I'm certainly interested to know them -- if you benchmark.
On Tue, Dec 1, 2015 at 6:31 PM, Tinker <[email protected]> wrote:
Hi!
I heard someone was working with implementing Fletcher checksums in
softraid.
Do you know any updates on this?
Fletcher checksums are how OpenBSD would guarantee that the data you
read
from disk actually has integrity. What makes it different from
traditional
checksumming is that it not only guarantees that a sector/block of
data read
has integrity within itself, but also that it actually belonged in the
place
on the disk that it was read from.
This is of particular importance when having sensitive information on
disks
with sector mapping, like all SSD:s (and even magnet disks, or??)
have,
which can break down.
For this reason, with ordinary filesystems, reading file contents
could give
you just about any data that's anywhere on the disk, while a
Fletcher-based
disk would give you a read error.
So it's really like a night and day difference.
https://en.wikipedia.org/wiki/Fletcher%27s_checksum
Thanks!
Tinker