[DRBD-user] Reproducible ASSERT( os.conn == C_WF_REPORT_PARAMS )

Brian Candler Fri, 12 Jul 2013 10:19:05 -0700

I have a setup where I can reliably reproduce the following within a fewminutes:

Jul 11 10:59:46 wrn-vm2 kernel: [236603.130604] block drbd0:uuid_compare()=-1 by rule 35Jul 11 10:59:46 wrn-vm2 kernel: [236603.135779] block drbd0: I shallbecome SyncTarget, but I am primary!Jul 11 10:59:46 wrn-vm2 kernel: [236603.142336] block drbd0: ASSERT(os.conn == C_WF_REPORT_PARAMS ) in/build/linux-s5x2oE/linux-3.2.46/drivers/block/drbd/drbd_receiver.c:3245


It's on Debian Wheezy with Debian stock kernel (3.2.0-4-amd64).

Jun 25 15:01:27 wrn-vm1 kernel: [ 626.901545] drbd: initialized.Version: 8.3.11 (api:88/proto:86-96)Jun 25 15:01:27 wrn-vm1 kernel: [ 626.901547] drbd: srcversion:F937DCB2E5D83C6CCE4A6C9


There are more details in this thread:
https://groups.google.com/forum/#!topic/ganeti/icqLNFk1si0

I am reproducing it using ganeti, which uses drbd on top of LVM logicalvolumes to replicate virtual machine images. It migrates virtualmachines by sending drdbsetup commands to switch master->slavereplication firstly to multi-master, and then to slave<-master(apparently by disconnecting and reconnecting). I believe there is somesort of race condition going on, because (a) it seems few if any otherpeople observe what I see; and (b) although I can reproduce the problemwithin a few minutes, if I attach a full-blown strace to the processwhich is issuing the drbdsetup calls, the problem goes away.

The google groups thread includes an strace log of execve() calls, soyou can see what sequence of drbdsetup calls are being issued. Is itpossible that ganeti is taking an unsafe approach to switching over thedrbd state?


Regards,

Brian Candler.

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] Reproducible ASSERT( os.conn == C_WF_REPORT_PARAMS )

Reply via email to