[DRBD-user] strange checksum error

Csurai Akos Wed, 01 Aug 2012 01:17:41 -0700

Hi,

We have experienced a strange replication problem since we use B protocol.
The scenario is the following:

Some binary files are saved to the replicated IO pair ( kernel:3.0.13,drbd-8.3.12, protocol B, EXT3 )

Later they are copied to an other (but replicated) directory.

They are still consistent and there is no problem till the io1 (theactual Primary) is rebooted.Strange it needs a reboot. An enforced role change does not show thesymptom.io2 takes the Primary role and when the cluster starts using the binaryfiles they show checksum error.

We have turned of the write cache in the sas disks ( sdparam --set WCE=0/dev/sda )

and the symptom seemed to be disappeared, but later it surfaced again.
Those corrupted binary files has some 40 kbytes hole filled with zeros.
Yes it can be a HW issue, but we did not see it with C protocol
(which is deadly slow in our system unfortunately)

Have someone seen something similar ?

Thanks,
Akos



_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] strange checksum error

Reply via email to