[DRBD-user] HVM with block-drbd, possible solution

Sauro Saltini Fri, 30 Apr 2010 01:30:30 -0700

Hi, first of all many thanks to everybody (expecially Linbit) for theexcellent work made on drbd.

I'm currently involved in setting up a Pacemaker / DRBD cluster to servea bunch of Xen VMs.


My current configuration is:
- 2 identical DELL R410 servers with 4core Xeon and 8Gb RAM

- 4x1Tb SATA connected to Dell's PERC 6i battery backed RAID controllersconfigured as RAID10 on each server.- DOM0 Slackware 13.0 X86_64 with Xen 4.0 compiled from source(2.6.31.13 Xenified kernel)

- OpenAIS 1.1.2 + PaceMaker 1.0.8 compiled from source
- DRBD 8.3.7 compiled from source

The configuration files for drbd and DomU's are local to each host(replicated by hand),the only shared data on drbd are guest's block devices: one drbdresource per guest built on two identically sized LVM logical volumes,Xen uses /dev/drbdX as guest's block device.


The whole thing is working flawlessy and seems fast and stable too.

I've got a question regarding HVM's (windows guests) and Primary/PrimaryDRBD for live migration.DRBD docs states that I can't use block-drbd helper script with HVM's(and other few cases)

I'm relatively new in setting up a pacemaker cluster so the question is :

How can I make pacemaker ocf RA take care of:
1) promoting drbd on target node
2) live migrating HVM
3) demoting drbd on start node
(Which is basically what the block-drbd helper is supposed to do)

If I haven't missed something setting ocf:linbit:drbd masters-max=1doesn't allow the drbd resource to be in Primary/Primary even during the(short) time of Xen live migration, on the other side settingmax-masters=2 causes drbd to run constantly in Primary/Primary mode,posing some data corruption risks (I don't want to use a clusteredfilesystem, because I want to store DomU's filesystems on a physicaldevice for best performance).

In short I would like to leave drbd guest resources primary only on theactive host (where the relative DomU has been started), set them toprimary/primary only when migrating guest DomU on the other host andthen quickly demote the resource on start host after migration iscomplete for safety reasons.

This apparently can't be done without some form of "cooperation" betweenthe ocf:linbit:drbd and ocf:heartbeat:Xen resource agents... which isexactly what I'm looking for but apparently cannot find in pacemaker's docs.

Please tell me if I miss something crucial about ocf RAs and their usagein this situation.


Now the (apparently) good news...

The approach taken by block-drbd seemed more logical for me, having asingle OCF RA managing and coordinating the whole transition (DomUmigration + drbd promoton/demotion).


Digging around I've found this patch:
http://post.gmane.org/post.php?group=gmane.comp.emulators.xen.devel&followup=80598

many thanks and full credits to the author: James Harper

So I've investigated a little bit more, and come up to the point where Ican instruct a (patched as described) qemu-dm to recognize drbdresources (specified as drbd:resource in Xen DomU cfg file) and map themto the correct /dev/drbd/by-res/xxx node in the HVM's... sadly thissolved only a part of the problem.

Starting from a Primary/Secondary state and launching (xm create) theHVM on the "Primary" drbd host works perfectly.After this I can do live migration of the HVM to the other host andobtain the sequence of promotions/demotions from the block-drbd script,leaving the system in the expected state.

Starting from a Secondary/Secondary drbd state (DomU stopped on eachhost) when i "xm create" an HVM DomU qemu-dm is fired before launchingthe block-drbd helpers, so the HVM correctly maps the device but drbdhas not yet been promoted to Primary and the DomU is immediately turned off.


BTW: Someone can explain this difference ?

Why the block-drbd script is called BEFORE starting a live migration(making also the destination host Primary before attempting to migrate)and not BEFORE (but after) attempting to create the DomU and map the vbdvia qemu-dm ?

Looking at the state transitions in /proc/drbd during a "xm create" ofmy HVM DomU I saw the Secondary->Primary transition happens... normallyfollowed by the inverse transition just after qemu-dm "finds" theresource in an unusable state and shuts down the creation of DomU.... itseems only a timing problem !

Qemu-dm was too fast (and even started a bit before) and checks the vbdBEFORE block-drbd script can promote to Primary... the logical (butbadly hackish) solution for me was inserting a delay in qemu-dm processif the resource IS "drbd:".


So, the final state is :

- I can create HVM guests using "drbd:res" syntax in configuration filesmaking block-drbd take care of drbd transitions- I can migrate / live migrate HVM (windows) guests having block-drbddoing his job- my solution is a bad hack (at least for creation of HVM) based on adelay inserted in qemu-dm to wait for block-drbd execution.


The complete patch to Xen 4.0 source (AGAIN THANKS TO : James Harper) is :

--- xenstore.c.orig     2010-04-29 23:23:45.720258686 +0200
+++ xenstore.c  2010-04-29 22:52:43.897264812 +0200
@@ -513,6 +513,15 @@
             params = newparams;
            format = &bdrv_raw;
         }
+       /* handle drbd mapping */
+       if (!strcmp(drv, "drbd")) {
+           char *newparams = malloc(17 + strlen(params) + 1);
+           sprintf(newparams, "/dev/drbd/by-res/%s", params);
+           free(params);
+           sleep(5);
+           params = newparams;
+           format = &bdrv_raw;
+       }

 #if 0
        /* Phantom VBDs are disabled because the use of paths

I've only added the sleep(5); statement to make qemu-dm relax a bit andwait for block-drbd to be called.

Please come up with your comments and ideas to stabilize and improve thepatch making it less hackish (at least in my little addition) andpossibly suitable for production use (probably finding a reliable way to"wait" for drbd state change in qemu-dm).



Sauro Saltini.




_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] HVM with block-drbd, possible solution

Reply via email to