Sorry to resurrect this issue. I'm making some progress on it. When I do a "ccm_tool -V" on the machine that works I get
[EMAIL PROTECTED] heartbeat]# ccm_tool -V ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c Registering with CCM ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c Setting up CCM callbacks ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c Activating CCM token ccm_tool[763]: 2006/08/30_18:47:07 info: mem_handle_event: Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm ccm_tool[763]: 2006/08/30_18:47:07 info: mem_handle_event: instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3 but on the machine that fails I get [EMAIL PROTECTED] heartbeat]# ccm_tool -V ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c Registering with CCM ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c Setting up CCM callbacks ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c Activating CCM token ccm_tool[4543]: 2006/08/30_18:47:37 WARN: ccm_age_connect:ccm_epoche.c CCM Activation failed The "CCM Activation failed" isn't good. Anyways, I straced it and noticed that the good machine has [EMAIL PROTECTED] heartbeat]# ls -ld /var/run/heartbeat/* drwxr-xr-x 2 hacluster haclient 4096 Aug 23 20:12 /var/run/heartbeat/ccm drwxr-x--- 2 hacluster haclient 4096 Aug 23 20:12 /var/run/heartbeat/crm srwxrwxrwx 1 root root 0 Aug 23 20:12 /var/run/heartbeat/lrm_callback_sock srwxrwxrwx 1 root root 0 Aug 23 20:12 /var/run/heartbeat/lrm_cmd_sock srwxrwxrwx 1 root root 0 Aug 23 20:12 /var/run/heartbeat/register drwxr-xr-t 4 root root 4096 Aug 23 20:12 /var/run/heartbeat/rsctmp srwxrwxrwx 1 root root 0 Aug 23 20:12 /var/run/heartbeat/stonithd srwxrwxrwx 1 root root 0 Aug 23 20:12 /var/run/heartbeat/stonithd_callback but the bad machine doesn't have these sockets. [EMAIL PROTECTED] heartbeat]# ls -ld /var/run/heartbeat/* drwxr-xr-x 2 hacluster haclient 4096 Aug 23 19:54 /var/run/heartbeat/ccm drwxr-x--- 2 hacluster haclient 4096 Aug 23 19:54 /var/run/heartbeat/crm drwxr-xr-t 2 hacluster haclient 4096 Aug 30 17:44 /var/run/heartbeat/rsctmp This explains why I'm getting the message heartbeat[4497]: 2006/08/30_18:44:13 ERROR: Message hist queue is filling up (151 messages in queue) in my log file. The socket isn't being read! Does anyone know where these sockets are being created? When I run heartbeat it isn't creating them which is very strange. _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
