[linux-cluster@ isn't really used nowadays; CCing users@clusterlabs]
On 08/05/18 12:18, Jason Gauthier wrote:
Greetings, I'm working on a setup of a two-node cluster with shared storage. I've been able to see the storage on both nodes, and appropriate configuration for fencing the bock device. The next step was getting DLM and GFS2 in a clone group to mount the FS on both drives. This is where I am running into trouble. As far as the OS goes, it's debian. I'm using pacemaker, corosync, and crm for cluster management.
Is it safe to assume that you're using Debian Wheezy? (The need for gfs_controld disappeared in the 3.3 kernel.) As wheezy goes end-of-life at the end of the month I would suggest upgrading, you will likely find the cluster tools more user friendly and the components more stable.
Andy
At the moment, I've removed the gfs2 parts just to try and get dlm working. My current config looks like this: node 1084772368: alpha node 1084772369: beta primitive p_dlm_controld ocf:pacemaker:controld \ op monitor interval=60 timeout=60 \ meta target-role=Started args=-K primitive p_gfs_controld ocf:pacemaker:controld \ params daemon=gfs_controld \ meta target-role=Started primitive stonith_sbd stonith:external/sbd \ params pcmk_delay_max=30 sbd_device="/dev/sdb1" group g_gfs2 p_dlm_controld p_gfs_controld clone cl_gfs2 g_gfs2 \ meta interleave=true target-role=Started property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=zeta \ last-lrm-refresh=1525523370 \ stonith-enabled=true \ stonith-timeout=20s When a bring the resources up, I get a quick blip in my logs. May 8 07:13:58 beta dlm_controld[9425]: 253556 dlm_controld 4.0.7 started May 8 07:14:00 beta kernel: [253558.641658] dlm: closing connection to node 1084772369 May 8 07:14:00 beta kernel: [253558.641764] dlm: closing connection to node 1084772368 This is the same messaging I see when I run dlm manually and then stop it. My challenge here is that I cannot find out what dlm is doing. I've tried adding -K to /etc/default/dlm, but I don't think that file is being respected. I would like to figure out how to increase the verbose output of dlm_controld so I can see why it won't stay running when it's launched through the cluster. I haven't been able to figure out how to pass arguments directly to the a daemon in the primitive config, if it's even possible. Otherwise, I would try to pass -K there. Thanks! Jason
-- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster