Simple 4-node cluster, 2-nodes have a GFS shared home directory mounted for over a month. Today, I wanted to mount /home on a 3rd node, so:
# service fenced start [failed] Weird. Checking /var/log/messages show: Aug 11 10:19:06 cerberus kernel: Lock_Harness 2.6.9-80.9.el4_7.10 (built Jan 22 2009 18:39:16) installed Aug 11 10:19:06 cerberus kernel: GFS 2.6.9-80.9.el4_7.10 (built Jan 22 2009 18:39:32) installed Aug 11 10:19:06 cerberus kernel: GFS: Trying to join cluster "lock_dlm", "ccc_cluster47:home" Aug 11 10:19:06 cerberus kernel: Lock_DLM (built Jan 22 2009 18:39:18) installed Aug 11 10:19:06 cerberus kernel: lock_dlm: fence domain not found; check fenced Aug 11 10:19:06 cerberus kernel: GFS: can't mount proto = lock_dlm, table = ccc_cluster47:home, hostdata = # cman_tool services Service Name GID LID State Code Fence Domain: "default" 0 2 join S-2,2,1 [] So, a fenced process is now hung: root 28302 0.0 0.0 3668 192 ? Ss 10:19 0:00 fenced -t 120 -w Q: Any idea how to "recover" from this state, without rebooting? The other two servers are unaffected by this (thankfully) and show normal operations: $ cman_tool services Service Name GID LID State Code Fence Domain: "default" 2 2 run - [1 12] DLM Lock Space: "home" 5 5 run - [1 12] GFS Mount Group: "home" 6 6 run - [1 12]
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
