--- On Thu, 3/5/09, Andrew Beekhof <[email protected]> wrote:
> From: Andrew Beekhof <[email protected]>
> Subject: Re: [Linux-HA] crm_mon vs cl_status
> To: [email protected]
> Cc: "Linux-HA mailing list" <[email protected]>
> Date: Thursday, March 5, 2009, 6:46 AM
> On Mar 5, 2009, at 12:39 PM, Harakiri wrote:
> >>
> >> YES it _is_.
> >> The log messages above indicate the order
> heartbeat starts
> >> them in -
> >> anything after that is up to the scheduler of your
> OS.
> >>
> >> Regardless, the crmd and cib both have loops that
> retry
> >> opening
> >> connections to the services they require - with
> the
> >> possible exception
> >> of the cluster itself.
> >
> > But these loops dont work - as i said on other systems
> like debian the processes are executed in the right order
> but not here.
> >
> > I can manually fix the opening of pipes with adding a
> while loop ipcsocket.c when the pipe does not exist yet - if
> they would loop itself to try again - why isnt it working ?
> i dont see any reference to a loop to
> >
> > struct IPC_CHANNEL *
> > socket_client_channel_new(GHashTable *ch_attrs)
> >
> > where is it?
>
> the loops i'm talking about are at a much higher level
> - i've no knowledge of how the IPC code works.
> eg. do_cib_control() arranges for the crmd to try
> connecting to the cib up to 30 times before giving up.
>
> it sounds like the solaris equivalent of
> socket_client_channel_new() isnt failing properly.
Yes - when i compile on sparc10 with sockets enabled instead of pipes the loops
are working :
cib[19975]: 2009/03/05_13:13:35 WARN: ccm_connect: CCM Activation failed
cib[19975]: 2009/03/05_13:13:35 WARN: ccm_connect: CCM Connection failed 1
times (30 max)
cib[19975]: 2009/03/05_13:13:38 WARN: ccm_connect: CCM Activation failed
cib[19975]: 2009/03/05_13:13:38 WARN: ccm_connect: CCM Connection failed 2
times (30 max)
but this never happends when pipes are used, since pipes are also controled in
the same socket_client_channel_new there is no difference - if either socket or
pipes fail NULL is returned - in crm/crmd/ccm.c i found the retry code - i have
no idea why it would fail - maybe an exception is thrown somewhere in between?!
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems