Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

Andrew Beekhof Thu, 13 Sep 2007 02:34:31 -0700

On 9/13/07, Yan Fitterer <[EMAIL PROTECTED]> wrote:
>
>
> Junko IKEDA wrote:
> >>> once again something about SplitBrain...
> >>> During SplitBrain, I wrecked the resource on the both nodes.
> >>> fail count was increased at this time.
> >>> But after recovering from SplitBrain, fail count returned to zero on
> > both!
> >>> Is this due to the restart of crmd or pengine/tengine?
> >> Most probably. The fail count belongs to the status section which
> >> is not saved.
> >
> > Where is the status section saved at?
>
> status section is never saved to disk. When the cluster is stopped, the
> status section disappears altogether.
>
> > I thought that CIB kept the status.
>
> Yes it does. But status has no meaning once the cluster is stopped - so
> it isn't kept. Hence failcounts being reset when cluster is restarted.
> As well, the failcount for a specific node will be reset when _that_
> node is restarted. How else could resources be allowed to start after a
> STONITH operation?
>
> > cib process seems not to be restarted in this case...
>
> there is no 'cib' process.


actually there is :-)

> If I understand things right, the crmd
> process handles all core CIB maintenance operations.

nope, all done by the CIB process

> Try pstree -p and
> look for the group of processes where the parent is "heartbeat".
>
> HTH
> Yan
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] fail count was initialized after recovering fromSplitBrain

Reply via email to