Junko IKEDA wrote:
once again something about SplitBrain...
During SplitBrain, I wrecked the resource on the both nodes.
fail count was increased at this time.
But after recovering from SplitBrain, fail count returned to zero on
both!
Is this due to the restart of crmd or pengine/tengine?
Most probably. The fail count belongs to the status section which
is not saved.
Where is the status section saved at?
status section is never saved to disk. When the cluster is stopped, the
status section disappears altogether.
I thought that CIB kept the status.
Yes it does. But status has no meaning once the cluster is stopped - so
it isn't kept. Hence failcounts being reset when cluster is restarted.
As well, the failcount for a specific node will be reset when _that_
node is restarted. How else could resources be allowed to start after a
STONITH operation?
cib process seems not to be restarted in this case...
there is no 'cib' process. If I understand things right, the crmd
process handles all core CIB maintenance operations. Try pstree -p and
look for the group of processes where the parent is "heartbeat".
HTH
Yan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems