07.02.2014 14:22, Asgaroth wrote: ... > > Thanks for the explanation, this is interresting for me as I need a > volume manager in the cluster to manager the shared file systems in case > I need to resize for some reason. I think I may be coming up against > something similar now that I am testing cman outside of the cluster, > even though I have cman/clvmd enabled outside pacemaker the clvmd daemon > still hangs even when the 2nd node has been rebooted due to a fence > operation, when it (node 2) reboots, cman & clvmd starts, I can see both > nodes as members using cman_tool, but clvmd still seems to have an > issue, it just hangs, I cant see off-hand if dlm still thinks pacemaker > is in the fence operation (or if it has already returned true for > successful fence). I am still gathering logs and will post back to this > thread once I have all my logs from yesterday and this morning.
As I wrote (may be it was not completely clear) there are two points where it clustered LVM may block: dlm (kern_stop flag in 'dlm ls' output) and clvmd itself (not all cluster nodes run clvmd). Of course there could be additional bugs. I'd break fencing for your node1 and look what dlm_tool shows there after node2 is fenced. 'dlm_tool ls' and 'dlm_tool dump' should provide enough information (but you'd probably need to dig into dlm_controld code to fully interpret the latter). Also, you may want to run clvmd in the debugging mode. > > I dont suppose there is another volume manager available that would be > cluster aware that anyone is aware of? I'm not aware of any. > >> >> Increasing timeout for LSB clvmd resource probably wont help you, >> because blocked (because of DLM waits for fencing) LVM operations iirc >> never finish. >> >> You may want to search for clvmd OCF resource-agent, it is available for >> SUSE I think. Although it is not perfect, it should work much better for >> you > > I will have a look around for this clvmd ocf agent, and see what is > involverd in getting it to work on CentOS 6.5 if I dont have any success > with the current recommendation for running it outside of pacemaker > control. Generally, that alone wont help, because you'll still get timeouts on every LVM operation if some of cman nodes do not run clvmd for any reason. I mean, if you manage VGs/LVs as cluster resources. But that removes one point of failure when combined with newer stack. I know that latest versions of cluster-stack software (those which require corosync2 and it's quorum implementation) work like a charm all-together, and there was a REASON to write them (and use them in RHEL7). > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org