I have a setup where I can reliably reproduce the following within a few
minutes:
Jul 11 10:59:46 wrn-vm2 kernel: [236603.130604] block drbd0:
uuid_compare()=-1 by rule 35
Jul 11 10:59:46 wrn-vm2 kernel: [236603.135779] block drbd0: I shall
become SyncTarget, but I am primary!
Jul 11 10:59:46 wrn-vm2 kernel: [236603.142336] block drbd0: ASSERT(
os.conn == C_WF_REPORT_PARAMS ) in
/build/linux-s5x2oE/linux-3.2.46/drivers/block/drbd/drbd_receiver.c:3245
It's on Debian Wheezy with Debian stock kernel (3.2.0-4-amd64).
Jun 25 15:01:27 wrn-vm1 kernel: [ 626.901545] drbd: initialized.
Version: 8.3.11 (api:88/proto:86-96)
Jun 25 15:01:27 wrn-vm1 kernel: [ 626.901547] drbd: srcversion:
F937DCB2E5D83C6CCE4A6C9
There are more details in this thread:
https://groups.google.com/forum/#!topic/ganeti/icqLNFk1si0
I am reproducing it using ganeti, which uses drbd on top of LVM logical
volumes to replicate virtual machine images. It migrates virtual
machines by sending drdbsetup commands to switch master->slave
replication firstly to multi-master, and then to slave<-master
(apparently by disconnecting and reconnecting). I believe there is some
sort of race condition going on, because (a) it seems few if any other
people observe what I see; and (b) although I can reproduce the problem
within a few minutes, if I attach a full-blown strace to the process
which is issuing the drbdsetup calls, the problem goes away.
The google groups thread includes an strace log of execve() calls, so
you can see what sequence of drbdsetup calls are being issued. Is it
possible that ganeti is taking an unsafe approach to switching over the
drbd state?
Regards,
Brian Candler.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user