Hi,

I experienced some problems with Xen resources.

I have a 4 node cluster, running 14 Xen domU's.

To not have a race condition on startup, when e.g. one node comes back from 
standby, or one node dies, I had to add order constraints for each domU, 

e.g. domU1 before domU2
      domU1 before domU3
      domU1 before domU4
     ...
     domU12 before domU13
     domU12 before domU14
     domU13 before domU14

with such a setup, this works well for me on multiple two node cluster, with 
about 4 or 5 Xen domU's.

However, in the larger cluster, there the cluster is busy with itself, 
managing resources. E.g. the GUI seems to hang often, because of a too busy 
crmd. The cluster is busy with propagating updates of the cib, commands will 
time out....

I added a lock on startup, so that in case there are more than one domU 
wanting to start up, the first will create the directory, and the others will 
wait, until the directory disappeared, and then start.
Due to this lock it may happen that the startup timeout is reached. To 
mitigate this problem, I added a start operation, with a timeout of 120s for 
each Xen resource, and never hit that timeout again. 

Appended is a patch with the locks that I added, which works well for me, but 
I suspect it may not be perfect, so input is welcome.

some questions:
 - maybe a better location for the lock directory, e.g. $OCF_ROOT/Xen.lock ?
 - maybe only enable startup locking, in case memory management is enabled?

due to the usage of the locking, and not the one to one order constraints, the 
whole startup of the cluster is faster, as the order was clusterwide, and the 
locking is only for the domU's on one node.

any input appreciated.

regards
Sebastian
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to