Tried something like that in the past: https://marc.info/?l=openbsd-tech&m=144217941801350&w=2
It worked kind of OK except the performance. The problem is that data layout makes read op. -> 2x read op. and write op. -> read op. + 2x write op. which is not the speed winner. Caching of checksuming blocks helped in some cases a lot, but was not submitted since you would also ideally need readahead and this was not done at all. The other perf issue is that putting this slow virtual drive impl. under already slow ffs is a receipt for disappointment from the perf. point of view. Certainly no speed daemon and certainly completely different league than checkumming able fss from open-source world (ZFS, btrfs, bcachefs. No HAMMER2 is not there since it checksum only meta-data and not user data and can't self-heal).
Yes, you are right that ideally drive would be fs aware to optimize rebuild, but this may be worked around by more clever layout marking also used blocks. Anyway, that's (and above) are IMHO reasons why development is done on checksumming fss instead of checksumming software raids. Read somewhere paper about linux's mdadm hacked to do checksums and the result was pretty much the same (IIRC!). E.g. perf. disappointment. If you are curious, google for it.
So, work on it if you can tolerate the speed... On 1/12/20 6:46 AM, Constantine A. Murenin wrote:
Dear misc@, I'm curious if anyone has any sort of tools / patches to verify the consistency of softraid(4) RAID1 volumes? If one adds a new disc (i.e. chunk) to a volume with the RAID1 discipline, the resilvering process of softraid(4) will read data from one of the existing discs, and write it back to all the discs, ridding you of the artefacts that could potentially be used to reconstruct the flipped bits correctly. Additionally, this resilvering process is also really slow. Per my notes from a few years ago, softraid has a fixed block size of 64KB (MAXPHYS); if we're talking about spindle-based HDDs, they only support like 80 random IOPS at 7,2k RPM, half of which we gotta use for reads, half for writes; this means it'll take (1TB/64KB/(80/s/2)) = 4,5 days to resilver each 1TB of an average 7,2k RPM HDD; compare this with sequential resilvering, which will take (1TB/120MB/s) = 2,3 hours; the reality may vary from these imprecise calculations, but these numbers do seem representative of the experience. The above behaviour is defined here: http://bxr.su/o/sys/dev/softraid_raid1.c#sr_raid1_rw 369 } else { 370 /* writes go on all working disks */ 371 chunk = i; 372 scp = sd->sd_vol.sv_chunks[chunk]; 373 switch (scp->src_meta.scm_status) { 374 case BIOC_SDONLINE: 375 case BIOC_SDSCRUB: 376 case BIOC_SDREBUILD: 377 break; 378 379 case BIOC_SDHOTSPARE: /* should never happen */ 380 case BIOC_SDOFFLINE: 381 continue; 382 383 default: 384 goto bad; 385 } 386 } What we could do is something like the following, to pretend that any online volume is not available for writes when the wu (Work Unit) we're handling is part of the rebuild process from http://bxr.su/o/sys/dev/softraid.c#sr_rebuild, mimicking the BIOC_SDOFFLINE behaviour for BIOC_SDONLINE chunks (discs) when the SR_WUF_REBUILD flag is set for the workunit: switch (scp->src_meta.scm_status) { case BIOC_SDONLINE: + if (wu->swu_flags & SR_WUF_REBUILD) + continue; /* must be same as BIOC_SDOFFLINE case */ + /* FALLTHROUGH */ case BIOC_SDSCRUB: case BIOC_SDREBUILD: Obviously, there's both pros and cons to such an approach; I've tested a variation of the above in production (not a fan weeks-long random-read/write rebuilds); but use this at your own risk, obviously. ... But back to the original problem, this consistency check would have to be file-system-specific, because we gotta know which blocks of softraid have and have not been used by the filesystem, as softraid itself is filesystem-agnostic. I'd imagine it'll be somewhat similar in concept to the fstrim(8) utility on GNU/Linux -- http://man7.org/linux/man-pages/man8/fstrim.8.html -- and would also open the door for the cron-based TRIM support as well (it would also have to know the softraid format itself, too). Any pointers or hints where to get started, or whether anyone has worked on this in the past? Cheers, Constantine. http://cm.su/