On 25.09.2012 11:28, Lars Ellenberg wrote:
On Sun, Sep 23, 2012 at 12:18:57PM +0200, Markus Müller wrote:
Hello DRBD Users,
I have a drbd two-node setup running, and got alarmed by the LINBIT
mail about sync problems with newer kernels. So I updated to 8.4.2
and tried to make sure anything is fine now.
Even if the mail of LINBIT says that no action is required after
upgrading, I tried the "drbdadm verify" feature. And it found "oos",
means blocks not in sync. I thought okay, good that you thought for
that, and tried to fix this as described in the LINBIT mail by doing
"drbdadm disconnect/connect". It synced the found "oos:" and I
thought everything is fine, so I did rerun the "drbdadm verify" to
just be sure. And I saw... just found more "oos:"! I did again a
"drbdadm disconnect/connect" but there were still more "oos:" after
the next "drbdadm verify". I made this some loops and saw that this
is not working at all to fix this!
If this is while the device was idle,
it is an indication that your hardware flips bits.
If it happens while the device is in use,
certain usage patterns can cause blocks to be different,
search for "digest integrity explained" in the list archives.
drbd has been stopped by setting from primary to secondary mode on the
primary, and then I run "drbdadm down" on both nodes. Then I flushed
kernel cache (echo 3 > /proc/sys/vm/drop_caches) on both sides and made
a new nbd server on and a new nbd client.
I've tested this hardware very well -> it HAD and HAS no problems
without the drbd module!
I don't think its an good idear to reject verifiable bugs by insinuate
buggy hardware!!!
LINBIT already found bugs with new kernels, it just seems that there are
some more than thought; and I have bad news for you: I reactivated the
array yesterday, used it, and deactivated it today again and there is
NEW INCONSISTENCY:
Run of yesterday:
root@as1:~# perl /root/diff.pl
1 bad 101.656 GB
2 bad 101.657 GB
3 bad 102.018 GB
4 bad 102.019 GB
5 bad 107.151 GB
6 bad 107.152 GB
7 bad 111.034 GB
8 bad 111.035 GB
9 bad 131.833 GB
10 bad 131.834 GB
11 bad 132.559 GB
12 bad 132.56 GB
13 bad 137.735 GB
14 bad 137.736 GB
15 bad 140.642 GB
16 bad 140.643 GB
17 bad 141.094 GB
18 bad 141.095 GB
19 bad 535.806 GB
20 bad 535.807 GB
21 bad 556.083 GB
22 bad 566.681 GB
23 bad 599.43 GB
24 bad 619.899 GB
root@as1:~#
Run of today:
root@as1:~# perl diff.pl
1 bad 66.044 GB
2 bad 79.641 GB
3 bad 82.567 GB
4 bad 82.57 GB
5 bad 82.578 GB
6 bad 82.593 GB
7 bad 111.034 GB
8 bad 111.035 GB
9 bad 123.787 GB
10 bad 123.788 GB
11 bad 131.833 GB
12 bad 131.834 GB
13 bad 132.559 GB
14 bad 132.56 GB
15 bad 139.435 GB
16 bad 139.436 GB
17 bad 140.664 GB
18 bad 140.665 GB
19 bad 149.93 GB
20 bad 149.938 GB
21 bad 198.326 GB
22 bad 217.039 GB
23 bad 217.042 GB
24 bad 217.044 GB
25 bad 217.045 GB
26 bad 217.049 GB
27 bad 249.926 GB
28 bad 265.254 GB
29 bad 265.255 GB
30 bad 284.159 GB
31 bad 284.164 GB
32 bad 284.17 GB
33 bad 284.172 GB
34 bad 295.717 GB
35 bad 295.718 GB
36 bad 378.504 GB
37 bad 378.506 GB
38 bad 378.508 GB
39 bad 399.445 GB
40 bad 416.755 GB
41 bad 528.304 GB
42 bad 528.311 GB
43 bad 528.312 GB
44 bad 528.313 GB
45 bad 528.314 GB
46 bad 528.315 GB
47 bad 528.321 GB
48 bad 528.322 GB
49 bad 528.335 GB
root@as1:~#
It seems that I have now different and more inconsistency! This is
absolutely inacceptable.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user