Greetings everyone,

We are running a small 3 node drbd 9.0.21 cluster, used for opennebula vm 
storage. Lately we have started recieving messages from one of our servers, 
about getting a "Wrong magic value 0x52464200 in protocol version 86". We are 
using lacp bonded 2 gig links for traffic exchange between drbd hosts. Both 
lacp links are working fine and no outages were detected previously. It's also 
interesting to point out that drbd9 seems to "ignore" the problematic node and 
doesn't place any vm's on it.

​

Here is a list of kernel versions on our servers:

100 - 4.15.0-72-generic #81-Ubuntu SMP         #controller node

101 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage node

102 - 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 #storage  node



We tried changing network rendering engines, checking /tmp dir size, making 
non-lacp network between servers, and searching for network inconsistencies and 
even used different networking equipment  but unfortunately, we haven't 
achieved any results.

The problem is highly reproducible in our environment ( even after fully 
reinstalling drbd, we still get magic value error ether on 101 or 102, while 
resources are avoiding deployment  on 102).





We get following logs in kernel ring buffer on 102:

[Wed Feb  5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) 




[Wed Feb  5 21:02:32 2020] drbd one-image-10: susp-io( user -> no)


[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0: Starting worker thread 
(from drbdsetup [20138])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting 
sender thread (from drbdsetup [20143])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting 
sender thread (from drbdsetup [20146])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( 
StandAlone -> Unconnected )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Starting 
receiver thread (from drbd_w_one-vm-1 [20139])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
StandAlone -> Unconnected )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting 
receiver thread (from drbd_w_one-vm-1 [20139])

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Failed to 
initiate connection, err=-98

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( 
Connecting -> Disconnecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Aborting 
remote state change 0 commit not possible

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Restarting 
sender thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Connection 
closed

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: conn( 
Disconnecting -> StandAlone )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating 
receiver thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Failed to 
initiate connection, err=-98

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Connecting -> Disconnecting )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting 
remote state change 0 commit not possible

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting 
sender thread

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection 
closed

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Disconnecting -> StandAlone )

[Wed Feb  5 21:02:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating 
receiver thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n100: Terminating 
sender thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating 
sender thread

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize 
called with capacity == 0

[Wed Feb  5 21:07:35 2020] drbd one-vm-1385-disk-0: Terminating worker thread



Seems quite a similar case described here: 
https://lists.linbit.com/pipermail/drbd-user/2009-October/012777.html



At the same time on 100 (controller node):

[Wed Feb  5 21:02:29 2020] drbd one-image-10: susp-io( no -> user) 





[Wed Feb  5 21:02:33 2020] drbd one-image-10: susp-io( user -> no)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0: Starting worker thread 
(from drbdsetup [3965])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting 
sender thread (from drbdsetup [3971])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting 
sender thread (from drbdsetup [3975])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: meta-data IO 
uses: blk-bio

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Diskless 
-> Attaching )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: Maximum number 
of peer devices = 7

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0: Method to ensure write 
ordering: flush

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize 
called with capacity == 4201520

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: resync bitmap: 
bits=525190 words=57449 pages=113

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB 
(2100760 KB)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: size = 2052 MB 
(2100760 KB)

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: recounting of 
set bits took additional 0ms

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Attaching 
-> UpToDate )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0/0 drbd1099: attached to 
current UUID: C518769AA829C854

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
StandAlone -> Unconnected )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting 
receiver thread (from drbd_w_one-vm-1 [3966])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
StandAlone -> Unconnected )

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Starting 
receiver thread (from drbd_w_one-vm-1 [3966])

[Wed Feb  5 21:02:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Handshake to 
peer 0 successful: Agreed network protocol version 116

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Feature 
flags enabled on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Peer 
authenticated using 20 bytes HMAC

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Starting 
ack_recv thread (from drbd_r_one-vm-1 [4004])

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Preparing 
remote state change 1265772256

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Committing 
remote state change 1265772256 (primary_nodes=0)

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Connecting -> Connected ) peer( Unknown -> Secondary )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
drbd_sync_handshake:

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
self C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 
flags:20

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
peer C518769AA829C854:0000000000000000:0000000000000000:0000000000000000 bits:0 
flags:20

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
uuid_compare()=no-sync by rule 40

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( no -> 
yes )

[Wed Feb  5 21:02:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
pdsk( DUnknown -> UpToDate ) repl( Off -> Established )

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic 
value 0x52464220 in protocol version 86 





[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Connecting -> NetworkFailure )

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting 
sender thread

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection 
closed

[Wed Feb  5 21:02:48 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
NetworkFailure -> Unconnected )

[Wed Feb  5 21:02:49 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic 
value 0x52464220 in protocol version 86

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Connecting -> NetworkFailure )

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting 
sender thread

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection 
closed

[Wed Feb  5 21:03:01 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
NetworkFailure -> Unconnected )

[Wed Feb  5 21:03:02 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Unconnected -> Connecting )

[Wed Feb  5 21:03:11 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Wrong magic 
value 0x52464220 in protocol version 86

<last 6 entries repeat for a while>
[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Restarting 
sender thread 





[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Connection 
closed

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: conn( 
Disconnecting -> StandAlone )

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating 
receiver thread

[Wed Feb  5 21:07:36 2020] drbd one-vm-1385-disk-0 nts-cloud-n102: Terminating 
sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Preparing cluster-wide 
state change 1186333411 (1->0 496/16)

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: State change 1186333411: 
primary_nodes=0, weak_nodes=0

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Cluster is 
now split

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Committing cluster-wide 
state change 1186333411 (0ms)

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Connected -> Disconnecting ) peer( Secondary -> Unknown )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: quorum( yes -> 
no )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099 nts-cloud-n101: 
pdsk( UpToDate -> DUnknown ) repl( Established -> Off )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: ack_receiver 
terminated

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating 
ack_recv thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Aborting 
remote state change 0 commit not possible

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Restarting 
sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Connection 
closed

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: conn( 
Disconnecting -> StandAlone )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating 
receiver thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0 nts-cloud-n101: Terminating 
sender thread

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( UpToDate 
-> Detaching )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: disk( Detaching 
-> Diskless )

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0/0 drbd1099: drbd_bm_resize 
called with capacity == 0

[Wed Feb  5 21:07:37 2020] drbd one-vm-1385-disk-0: Terminating worker thread



drbdadm status on 100:

one-vm-1385-disk-0 role:Secondary 





  disk:UpToDate

  nts-cloud-n101 role:Secondary

    peer-disk:UpToDate

  nts-cloud-n102 connection:Connecting





Best of all,

Gleb.
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to