Dear all!
I configured two nodes with corosync and DRBD 8.4 on Ubuntu Server 14.04.02 
LTS. After I updated the Kernel on both nodes the nodes cannot sync anymore. I 
tried ​to downgrade to the lastest still working kernel without success. The 
sync process starts and ends with a message saying "BAD! BarrierACK received 
#432, expected #431". I looked up the error but it seems to be an error not 
many run into. Now I really don't know what to do anymore. Maybe someone of you 
can help me. The two nodes now work in a production environment, so starting 
from scratch is not an option I have to add. Backups of the data on the DRBD 
device are being made on a daily basis though.
My setup:Kernel: 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 
x86_64 x86_64 x86_64 GNU/LinuxDRBD:DRBDADM_BUILDTAG=GIT-hash:\ 
599f286440bd633d15d5ff985204aff4bccffadd\ build\ by\ phil@fat-tyre\,\ 
2013-10-11\ 
16:42:48DRBDADM_API_VERSION=1DRBD_KERNEL_VERSION_CODE=0x080403DRBDADM_VERSION_CODE=0x080404DRBDADM_VERSION=8.4.4
I use boding for the network interfaces (on each node) so I have a 
fail-over.The DRBD devices are on RAID-5 software RAID block devices.As of now 
my sync fails regularly at around 4-6% into the sync process.
dmesg tells me:[317758.655502] d-con AxigenData: receiver 
terminated[317758.655504] d-con AxigenData: Restarting receiver 
thread[317758.655506] d-con AxigenData: receiver (re)started[317758.655521] 
d-con AxigenData: conn( Unconnected -> WFConnection ) [317759.154195] d-con 
AxigenData: Handshake successful: Agreed network protocol version 
101[317759.154495] d-con AxigenData: Peer authenticated using 20 bytes 
HMAC[317759.154669] d-con AxigenData: conn( WFConnection -> WFReportParams ) 
[317759.154673] d-con AxigenData: Starting asender thread (from drbd_r_AxigenDa 
[2421])[317759.191027] block drbd0: drbd_sync_handshake:[317759.191033] block 
drbd0: self 10FC6F75510DED77:D6EA27F99D183685:D6E927F99D183685:D6E827F99D183685 
bits:13231595 flags:0[317759.191037] block drbd0: peer 
D6EA27F99D183684:0000000000000000:26E9BD2F810F6A44:26E8BD2F810F6A45 
bits:13231245 flags:0[317759.191041] block drbd0: uuid_compare()=1 by rule 
70[317759.191043] block drbd0: Becoming sync source due to disk 
states.[317759.191052] block drbd0: peer( Unknown -> Secondary ) conn( 
WFReportParams -> WFBitMapS ) [317759.242168] block drbd0: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 120(1), total 120; compression: 
100.0%[317759.290871] block drbd0: receive bitmap stats [Bytes(packets)]: plain 
0(0), RLE 120(1), total 120; compression: 100.0%[317759.290880] block drbd0: 
helper command: /sbin/drbdadm before-resync-source minor-0[317759.292274] block 
drbd0: helper command: /sbin/drbdadm before-resync-source minor-0 exit code 0 
(0x0)[317759.292296] block drbd0: conn( WFBitMapS -> SyncSource ) 
[317759.292306] block drbd0: Began resync as SyncSource (will sync 52926408 KB 
[13231602 bits set]).[317759.292367] block drbd0: updated sync UUID 
10FC6F75510DED77:D6EB27F99D183685:D6EA27F99D183685:D6E927F99D183685[317893.125154]
 d-con AxigenData: BAD! BarrierAck #519879 received, expected 
#519878![317893.143636] d-con AxigenData: peer( Secondary -> Unknown ) conn( 
SyncSource -> ProtocolError ) [317893.143670] d-con AxigenData: asender 
terminated[317893.143674] d-con AxigenData: Terminating 
drbd_a_AxigenDa[317893.278891] d-con AxigenData: Connection 
closed[317893.278928] d-con AxigenData: conn( ProtocolError -> Unconnected ) 
[317893.278930] d-con AxigenData: receiver terminated
Thing is, the error message the counters for the error message: "BAD!...." 
change every time the sync process is terminated. I fsck-ed the ext4-partition 
on my UpToDate node and it's clean, smartctl tells me, my disks are OK as well. 
If you need more information please let me know, more than happy to provide any 
logs and details you might need in order to help me.
Thank you very much
Paolo
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to