On Wed, Mar 4, 2009 at 17:15, Harakiri <[email protected]> wrote:
>
> Thanks for answering,
>
>
> --- On Wed, 3/4/09, Andrew Beekhof <[email protected]> wrote:
>
>>
>> crm_mon takes other things into account.
>> but without logs or the current cib its impossible to say
>> for sure why
>> this is happening.
>
>
> after a reboot, or restart the following log information are found in ha-debug
>
> http://pastebin.com/m7d9c71f7

not enough information - there's practically nothing from the crmd

>
> note the only error is :
>
> mgmtd[5612]: 2009/03/04_16:58:25 ERROR: socket_client_channel_new: 
> open(/var/lib/heartbeat/run/heartbeat/lrm_cmd_sock, ...) failure: No such 
> file or directory
>
> but it exists - its probably a race condition and created later:
>
> ls -la /var/lib/heartbeat/run/heartbeat/lrm_cmd_sock
> prwxrwxrwx   1 root     root           0 Mar  4 16:58 
> /var/lib/heartbeat/run/heartbeat/lrm_cmd_sock|

unrelated

> At this point, cibadmin etc will not work and hang because they cant seem to 
> connect to the crmd, crm_mon will indicate the note as offline

cibadmin connects to the cib, not the crmd.
however by default it does try to connect to the instance on the DC -
and it appears you dont have one of those (though i've no idea why
because you didnt include enough logs).

> After killing crmd the following log information is found:
>
> http://pastebin.com/m29a3ec9d
>
> crmd[5644]: 2009/03/04_17:06:29 info: do_cib_control: CIB connection 
> established
>
> etc
>
> So it seems that on the initial start crmd does not correctly initialize, 
> maybe the cib process has to be started before crmd?

it always is

heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client
"/opt/heartbeat/lib/heartbeat/cib" (17,65)
heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client
"/opt/heartbeat/lib/heartbeat/lrmd -r" (0,0)
heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client
"/opt/heartbeat/lib/heartbeat/stonithd" (0,0)
heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client
"/opt/heartbeat/lib/heartbeat/attrd" (17,65)
heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client
"/opt/heartbeat/lib/heartbeat/crmd" (17,65)

> Maybe its related to the issue that under solaris sparc PIPES are used 
> instead of sockets for communication

that is almost certainly the primary issue
if this isn't working then the cib/crmd are also going to be unable to
connect to heartbeat (and so can't communicate to the other nodes)

in short, until the IPC code works, nothing will.

> PIPES were introduced because of this patch
>
> http://www.mail-archive.com/[email protected]/msg00307.html
>
> since i have solaris 10 i tried to use streams but i dont find the ucred.h 
> anywere for solaris.
>
> Any ideas? How can i modify the "Starting child client" in different order?
>
> Thanks
>
>
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to