Hi, first of all many thanks to everybody (expecially Linbit) for the
excellent work made on drbd.
I'm currently involved in setting up a Pacemaker / DRBD cluster to serve
a bunch of Xen VMs.
My current configuration is:
- 2 identical DELL R410 servers with 4core Xeon and 8Gb RAM
- 4x1Tb SATA connected to Dell's PERC 6i battery backed RAID controllers
configured as RAID10 on each server.
- DOM0 Slackware 13.0 X86_64 with Xen 4.0 compiled from source
(2.6.31.13 Xenified kernel)
- OpenAIS 1.1.2 + PaceMaker 1.0.8 compiled from source
- DRBD 8.3.7 compiled from source
The configuration files for drbd and DomU's are local to each host
(replicated by hand),
the only shared data on drbd are guest's block devices: one drbd
resource per guest built on two identically sized LVM logical volumes,
Xen uses /dev/drbdX as guest's block device.
The whole thing is working flawlessy and seems fast and stable too.
I've got a question regarding HVM's (windows guests) and Primary/Primary
DRBD for live migration.
DRBD docs states that I can't use block-drbd helper script with HVM's
(and other few cases)
I'm relatively new in setting up a pacemaker cluster so the question is :
How can I make pacemaker ocf RA take care of:
1) promoting drbd on target node
2) live migrating HVM
3) demoting drbd on start node
(Which is basically what the block-drbd helper is supposed to do)
If I haven't missed something setting ocf:linbit:drbd masters-max=1
doesn't allow the drbd resource to be in Primary/Primary even during the
(short) time of Xen live migration, on the other side setting
max-masters=2 causes drbd to run constantly in Primary/Primary mode,
posing some data corruption risks (I don't want to use a clustered
filesystem, because I want to store DomU's filesystems on a physical
device for best performance).
In short I would like to leave drbd guest resources primary only on the
active host (where the relative DomU has been started), set them to
primary/primary only when migrating guest DomU on the other host and
then quickly demote the resource on start host after migration is
complete for safety reasons.
This apparently can't be done without some form of "cooperation" between
the ocf:linbit:drbd and ocf:heartbeat:Xen resource agents... which is
exactly what I'm looking for but apparently cannot find in pacemaker's docs.
Please tell me if I miss something crucial about ocf RAs and their usage
in this situation.
Now the (apparently) good news...
The approach taken by block-drbd seemed more logical for me, having a
single OCF RA managing and coordinating the whole transition (DomU
migration + drbd promoton/demotion).
Digging around I've found this patch:
http://post.gmane.org/post.php?group=gmane.comp.emulators.xen.devel&followup=80598
many thanks and full credits to the author: James Harper
So I've investigated a little bit more, and come up to the point where I
can instruct a (patched as described) qemu-dm to recognize drbd
resources (specified as drbd:resource in Xen DomU cfg file) and map them
to the correct /dev/drbd/by-res/xxx node in the HVM's... sadly this
solved only a part of the problem.
Starting from a Primary/Secondary state and launching (xm create) the
HVM on the "Primary" drbd host works perfectly.
After this I can do live migration of the HVM to the other host and
obtain the sequence of promotions/demotions from the block-drbd script,
leaving the system in the expected state.
Starting from a Secondary/Secondary drbd state (DomU stopped on each
host) when i "xm create" an HVM DomU qemu-dm is fired before launching
the block-drbd helpers, so the HVM correctly maps the device but drbd
has not yet been promoted to Primary and the DomU is immediately turned off.
BTW: Someone can explain this difference ?
Why the block-drbd script is called BEFORE starting a live migration
(making also the destination host Primary before attempting to migrate)
and not BEFORE (but after) attempting to create the DomU and map the vbd
via qemu-dm ?
Looking at the state transitions in /proc/drbd during a "xm create" of
my HVM DomU I saw the Secondary->Primary transition happens... normally
followed by the inverse transition just after qemu-dm "finds" the
resource in an unusable state and shuts down the creation of DomU.... it
seems only a timing problem !
Qemu-dm was too fast (and even started a bit before) and checks the vbd
BEFORE block-drbd script can promote to Primary... the logical (but
badly hackish) solution for me was inserting a delay in qemu-dm process
if the resource IS "drbd:".
So, the final state is :
- I can create HVM guests using "drbd:res" syntax in configuration files
making block-drbd take care of drbd transitions
- I can migrate / live migrate HVM (windows) guests having block-drbd
doing his job
- my solution is a bad hack (at least for creation of HVM) based on a
delay inserted in qemu-dm to wait for block-drbd execution.
The complete patch to Xen 4.0 source (AGAIN THANKS TO : James Harper) is :
--- xenstore.c.orig 2010-04-29 23:23:45.720258686 +0200
+++ xenstore.c 2010-04-29 22:52:43.897264812 +0200
@@ -513,6 +513,15 @@
params = newparams;
format = &bdrv_raw;
}
+ /* handle drbd mapping */
+ if (!strcmp(drv, "drbd")) {
+ char *newparams = malloc(17 + strlen(params) + 1);
+ sprintf(newparams, "/dev/drbd/by-res/%s", params);
+ free(params);
+ sleep(5);
+ params = newparams;
+ format = &bdrv_raw;
+ }
#if 0
/* Phantom VBDs are disabled because the use of paths
I've only added the sleep(5); statement to make qemu-dm relax a bit and
wait for block-drbd to be called.
Please come up with your comments and ideas to stabilize and improve the
patch making it less hackish (at least in my little addition) and
possibly suitable for production use (probably finding a reliable way to
"wait" for drbd state change in qemu-dm).
Sauro Saltini.
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user