Hi,
I experienced some problems with Xen resources.
I have a 4 node cluster, running 14 Xen domU's.
To not have a race condition on startup, when e.g. one node comes back from
standby, or one node dies, I had to add order constraints for each domU,
e.g. domU1 before domU2
domU1 before domU3
domU1 before domU4
...
domU12 before domU13
domU12 before domU14
domU13 before domU14
with such a setup, this works well for me on multiple two node cluster, with
about 4 or 5 Xen domU's.
However, in the larger cluster, there the cluster is busy with itself,
managing resources. E.g. the GUI seems to hang often, because of a too busy
crmd. The cluster is busy with propagating updates of the cib, commands will
time out....
I added a lock on startup, so that in case there are more than one domU
wanting to start up, the first will create the directory, and the others will
wait, until the directory disappeared, and then start.
Due to this lock it may happen that the startup timeout is reached. To
mitigate this problem, I added a start operation, with a timeout of 120s for
each Xen resource, and never hit that timeout again.
Appended is a patch with the locks that I added, which works well for me, but
I suspect it may not be perfect, so input is welcome.
some questions:
- maybe a better location for the lock directory, e.g. $OCF_ROOT/Xen.lock ?
- maybe only enable startup locking, in case memory management is enabled?
due to the usage of the locking, and not the one to one order constraints, the
whole startup of the cluster is faster, as the order was clusterwide, and the
locking is only for the domU's on one node.
any input appreciated.
regards
Sebastian
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/