hi, I have a GFS cluster (4 node + qdisk on SAN), and have problems shutting down cman service / unmount gfs mountpoints - it causes the shutdown to hang. I am running GFS & CLVM (lv's are xen guest drives). If i try and shut down cman service manually, i get an error that resources are still in use. 1 gfs directory is exported via NFS.
I think it may be because of service stop order, specifically openais stopping before cman - could this be a valid reason? Init6 levels are: K00xendomains K01xend K03libvirtd K20nfs K20openais K74gfs K74gfs2 K76clvmd K78qdiskd K79cman K86nfslock If I manually run through the stopping of these services, gfs service hangs. This is the log: Feb 13 10:35:25 vmhost-01 gfs_controld[3227]: cluster is down, exiting Feb 13 10:35:25 vmhost-01 dlm_controld[3221]: cluster is down, exiting Feb 13 10:35:25 vmhost-01 fenced[3215]: cluster is down, exiting Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 4 Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 3 Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 2 Feb 13 10:35:25 vmhost-01 kernel: dlm: closing connection to node 1 Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> cman_dispatch: Host is down Feb 13 10:35:27 vmhost-01 qdiskd[3201]: <err> Halting qdisk operations Feb 13 10:35:51 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 30 seconds. Feb 13 10:36:13 vmhost-01 mountd[3927]: Caught signal 15, un-registering and exiting. Feb 13 10:36:13 vmhost-01 kernel: nfsd: last server has exited Feb 13 10:36:13 vmhost-01 kernel: nfsd: unexporting all filesystems Feb 13 10:36:21 vmhost-01 ccsd[3165]: Unable to connect to cluster infrastructure after 60 seconds. ccsd continues to repeat the last message, increasing time: 60s, 90s, 120s, 180s, 210s, etc dmesg shows: dlm: closing connection to node 4 dlm: closing connection to node 3 dlm: closing connection to node 2 dlm: closing connection to node 1 There are no open files on GFS (from lsof) I am using gfs (1). The only workaround I have now is to reset the nodes via ILO once the shutdown process starts (and hangs on either gfs or cman service stop).
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
