Dear all!
I configured two nodes with corosync and DRBD 8.4 on Ubuntu Server 14.04.02
LTS. After I updated the Kernel on both nodes the nodes cannot sync anymore. I
tried to downgrade to the lastest still working kernel without success. The
sync process starts and ends with a message saying "BAD! BarrierACK received
#432, expected #431". I looked up the error but it seems to be an error not
many run into. Now I really don't know what to do anymore. Maybe someone of you
can help me. The two nodes now work in a production environment, so starting
from scratch is not an option I have to add. Backups of the data on the DRBD
device are being made on a daily basis though.
My setup:Kernel: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015
x86_64 x86_64 x86_64 GNU/LinuxDRBD:DRBDADM_BUILDTAG=GIT-hash:\
599f286440bd633d15d5ff985204aff4bccffadd\ build\ by\ phil@fat-tyre\,\
2013-10-11\
16:42:48DRBDADM_API_VERSION=1DRBD_KERNEL_VERSION_CODE=0x080403DRBDADM_VERSION_CODE=0x080404DRBDADM_VERSION=8.4.4
I use boding for the network interfaces (on each node) so I have a
fail-over.The DRBD devices are on RAID-5 software RAID block devices.As of now
my sync fails regularly at around 4-6% into the sync process.
dmesg tells me:[317758.655502] d-con AxigenData: receiver
terminated[317758.655504] d-con AxigenData: Restarting receiver
thread[317758.655506] d-con AxigenData: receiver (re)started[317758.655521]
d-con AxigenData: conn( Unconnected -> WFConnection ) [317759.154195] d-con
AxigenData: Handshake successful: Agreed network protocol version
101[317759.154495] d-con AxigenData: Peer authenticated using 20 bytes
HMAC[317759.154669] d-con AxigenData: conn( WFConnection -> WFReportParams )
[317759.154673] d-con AxigenData: Starting asender thread (from drbd_r_AxigenDa
[2421])[317759.191027] block drbd0: drbd_sync_handshake:[317759.191033] block
drbd0: self 10FC6F75510DED77:D6EA27F99D183685:D6E927F99D183685:D6E827F99D183685
bits:13231595 flags:0[317759.191037] block drbd0: peer
D6EA27F99D183684:0000000000000000:26E9BD2F810F6A44:26E8BD2F810F6A45
bits:13231245 flags:0[317759.191041] block drbd0: uuid_compare()=1 by rule
70[317759.191043] block drbd0: Becoming sync source due to disk
states.[317759.191052] block drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) [317759.242168] block drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression:
100.0%[317759.290871] block drbd0: receive bitmap stats [Bytes(packets)]: plain
0(0), RLE 120(1), total 120; compression: 100.0%[317759.290880] block drbd0:
helper command: /sbin/drbdadm before-resync-source minor-0[317759.292274] block
drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0
(0x0)[317759.292296] block drbd0: conn( WFBitMapS -> SyncSource )
[317759.292306] block drbd0: Began resync as SyncSource (will sync 52926408 KB
[13231602 bits set]).[317759.292367] block drbd0: updated sync UUID
10FC6F75510DED77:D6EB27F99D183685:D6EA27F99D183685:D6E927F99D183685[317893.125154]
d-con AxigenData: BAD! BarrierAck #519879 received, expected
#519878![317893.143636] d-con AxigenData: peer( Secondary -> Unknown ) conn(
SyncSource -> ProtocolError ) [317893.143670] d-con AxigenData: asender
terminated[317893.143674] d-con AxigenData: Terminating
drbd_a_AxigenDa[317893.278891] d-con AxigenData: Connection
closed[317893.278928] d-con AxigenData: conn( ProtocolError -> Unconnected )
[317893.278930] d-con AxigenData: receiver terminated
Thing is, the error message the counters for the error message: "BAD!...."
change every time the sync process is terminated. I fsck-ed the ext4-partition
on my UpToDate node and it's clean, smartctl tells me, my disks are OK as well.
If you need more information please let me know, more than happy to provide any
logs and details you might need in order to help me.
Thank you very much
Paolo
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user