Sorry to resurrect this issue. I'm making some progress on it. When I do
a "ccm_tool -V" on the machine that works I get

[EMAIL PROTECTED] heartbeat]# ccm_tool -V
ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c
Registering with CCM
ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c
Setting up CCM callbacks
ccm_tool[763]: 2006/08/30_18:47:07 debug: ccm_age_connect:ccm_epoche.c
Activating CCM token
ccm_tool[763]: 2006/08/30_18:47:07 info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
ccm_tool[763]: 2006/08/30_18:47:07 info: mem_handle_event: instance=1,
nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3

but on the machine that fails I get

[EMAIL PROTECTED] heartbeat]# ccm_tool -V
ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c
Registering with CCM
ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c
Setting up CCM callbacks
ccm_tool[4543]: 2006/08/30_18:47:37 debug: ccm_age_connect:ccm_epoche.c
Activating CCM token
ccm_tool[4543]: 2006/08/30_18:47:37 WARN: ccm_age_connect:ccm_epoche.c
CCM Activation failed

The "CCM Activation failed" isn't good. Anyways, I straced it and
noticed that the good machine has

[EMAIL PROTECTED] heartbeat]# ls -ld /var/run/heartbeat/*
drwxr-xr-x  2 hacluster haclient 4096 Aug 23 20:12
/var/run/heartbeat/ccm
drwxr-x---  2 hacluster haclient 4096 Aug 23 20:12
/var/run/heartbeat/crm
srwxrwxrwx  1 root      root        0 Aug 23 20:12
/var/run/heartbeat/lrm_callback_sock
srwxrwxrwx  1 root      root        0 Aug 23 20:12
/var/run/heartbeat/lrm_cmd_sock
srwxrwxrwx  1 root      root        0 Aug 23 20:12
/var/run/heartbeat/register
drwxr-xr-t  4 root      root     4096 Aug 23 20:12
/var/run/heartbeat/rsctmp
srwxrwxrwx  1 root      root        0 Aug 23 20:12
/var/run/heartbeat/stonithd
srwxrwxrwx  1 root      root        0 Aug 23 20:12
/var/run/heartbeat/stonithd_callback

but the bad machine doesn't have these sockets.

[EMAIL PROTECTED] heartbeat]# ls -ld /var/run/heartbeat/*
drwxr-xr-x  2 hacluster haclient 4096 Aug 23 19:54
/var/run/heartbeat/ccm
drwxr-x---  2 hacluster haclient 4096 Aug 23 19:54
/var/run/heartbeat/crm
drwxr-xr-t  2 hacluster haclient 4096 Aug 30 17:44
/var/run/heartbeat/rsctmp

This explains why I'm getting the message

heartbeat[4497]: 2006/08/30_18:44:13 ERROR: Message hist queue is
filling up (151 messages in queue)

in my log file. The socket isn't being read! Does anyone know where
these sockets are being created? When I run heartbeat it isn't creating
them which is very strange.

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to