On Wed, Mar 4, 2009 at 17:15, Harakiri <[email protected]> wrote: > > Thanks for answering, > > > --- On Wed, 3/4/09, Andrew Beekhof <[email protected]> wrote: > >> >> crm_mon takes other things into account. >> but without logs or the current cib its impossible to say >> for sure why >> this is happening. > > > after a reboot, or restart the following log information are found in ha-debug > > http://pastebin.com/m7d9c71f7
not enough information - there's practically nothing from the crmd > > note the only error is : > > mgmtd[5612]: 2009/03/04_16:58:25 ERROR: socket_client_channel_new: > open(/var/lib/heartbeat/run/heartbeat/lrm_cmd_sock, ...) failure: No such > file or directory > > but it exists - its probably a race condition and created later: > > ls -la /var/lib/heartbeat/run/heartbeat/lrm_cmd_sock > prwxrwxrwx 1 root root 0 Mar 4 16:58 > /var/lib/heartbeat/run/heartbeat/lrm_cmd_sock| unrelated > At this point, cibadmin etc will not work and hang because they cant seem to > connect to the crmd, crm_mon will indicate the note as offline cibadmin connects to the cib, not the crmd. however by default it does try to connect to the instance on the DC - and it appears you dont have one of those (though i've no idea why because you didnt include enough logs). > After killing crmd the following log information is found: > > http://pastebin.com/m29a3ec9d > > crmd[5644]: 2009/03/04_17:06:29 info: do_cib_control: CIB connection > established > > etc > > So it seems that on the initial start crmd does not correctly initialize, > maybe the cib process has to be started before crmd? it always is heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client "/opt/heartbeat/lib/heartbeat/cib" (17,65) heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client "/opt/heartbeat/lib/heartbeat/lrmd -r" (0,0) heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client "/opt/heartbeat/lib/heartbeat/stonithd" (0,0) heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client "/opt/heartbeat/lib/heartbeat/attrd" (17,65) heartbeat[5596]: 2009/03/04_16:58:24 info: Starting child client "/opt/heartbeat/lib/heartbeat/crmd" (17,65) > Maybe its related to the issue that under solaris sparc PIPES are used > instead of sockets for communication that is almost certainly the primary issue if this isn't working then the cib/crmd are also going to be unable to connect to heartbeat (and so can't communicate to the other nodes) in short, until the IPC code works, nothing will. > PIPES were introduced because of this patch > > http://www.mail-archive.com/[email protected]/msg00307.html > > since i have solaris 10 i tried to use streams but i dont find the ucred.h > anywere for solaris. > > Any ideas? How can i modify the "Starting child client" in different order? > > Thanks > > > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
