Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should

Steve Costaras Sat, 28 Aug 2010 12:58:55 -0700

True, ZFS does offer some benefit here, though with oracle droppingopensolaris completely that means you are stuck with solaris x86 orsolaris sparc. BTRFS is a ways away for any real solution forproduction use (years to reach the same level of debugging as zfs).The point on large drives and their other issues concerning availability(rebuild speeds in relation to capacity) it's generally good to limitthe size of a stripe width to say 8 drives or less. If more space isneeded use multiple sets of drives.

Though should add that for the OP's issue this is still speculationthough with a good probability of what's happening if the md5's are fineon manual check.


Steve

On 2010-08-28 10:52, Paul Mather wrote:

On Aug 28, 2010, at 7:12 AM, Steve Costaras wrote:
Could be due to a transient error (transmission or wild/torn read attime of calculation). I see this a lot with integrity checking offiles here (50TiB of storage).
Only way to get around this now is to do a known-good sha1/md5 hashof data (2-3 reads of the file make sure that they all match and thatthe file is not corrupted) save that as a baseline and then whendoing reads/compares if one fails do another re-read and see if thefirst one was in error and compare that with your baseline. Thisis one reason why I'm switching to the new generation of sas drivesthat have ioecc checks on READS not just writes to help cut down onsome of this.
Corruption does occur as well and is more probable with the higherthe capacity of the drive. Ideally you would have a drive thatwould do ioecc on reads, plus using T10 PI extensions (DIX/DIF) fromdrive to controller up to your file system layer. It won't alwaysprevent it by itself but would allow if you have a raid setup to dosome self-healing when a drive reports a non transient (i.e.corrupted sector of data).
However the T10 PI extensions are only on sas/fc drives (520/528 byteblocks) and so far as I can tell only the new LSI hba's support asmall subset of this (no hardware raid controllers I can find) andhave not seen any support up to the OS/filesystem level. SATA isnot included at all as the T13 group opted not to include it in the spec.
You could also stick with your current hardware and use a file systemthat emphasises end-to-end data integrity like ZFS. ZFS checksums atmany levels, and has a "don't trust the hardware" mentality. It candetect silent data corruption and automatically self-heal whereredundancy permits.
ZFS also supports pool scrubbing---akin to the "patrol reading" ofmany RAID controllers---for proactive detection of silent datacorruption. With drive capacities becoming very large, theprobability of an unrecoverable read becomes very high. This becomesvery significant even in redundant storage systems because a drivefailure necessitates a lengthy rebuild period during which the storagearray lacks any redundancy (in the case of RAID-5). It is for thisreason that RAID-6 (ZFS raidz2) is becoming de rigeur formany-terabyte arrays using large drives, and, specifically, the reasonZFS garnered its triple-parity raidz3 pool type (in ZFS pool version 17).
I believe Btrfs intends to bring many ZFS features to Linux.

Cheers,

Paul.

------------------------------------------------------------------------------
Sell apps to millions through the Intel(R) Atom(Tm) Developer Program
Be part of this innovative community and reach millions of netbook users 
worldwide. Take advantage of special opportunities to increase revenue and 
speed time-to-market. Join now, and jumpstart your future.
http://p.sf.net/sfu/intel-atom-d2d

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Verify differences: SHA1 sum doesn't match but it should

Reply via email to