On 10/4/07, Junko IKEDA <[EMAIL PROTECTED]> wrote: > Hi, > > when I tried the following case, > the return code of start action was something strange. > > 1) There are two node; active and standby node > 2) one resource is running on the active node > 3) SplitBrain came up!
you created a split brain or it occurred on its own? > 4) the resource would be going to start on the both node, you dont have stonith configured right? because this is exactly the reason why two-node clusters, particularly ones without stonith configured are a seriously bad idea. at least configure pingd so that only one side will try and run the resources > I drive it into failure on purpose on the standby node. > so, the return code of start action would be -1 on standby. > (it worked well) -1 means "timed out"... thats not a good value to return from an RA the whole concept of trying to handle this is in a resource's start action is a horrible substitute for a correctly configured cluster. continuing down this path will only lead to pain. > 5) after recovering SplitBrain, the return code on standby node was "-2"... > and crm_mon on the active node also showed it as -2. > > Why is it incremented? i'm not sure i follow this anymore... which return code are you talking about? if you're talking about the one from the start action, it is never modified in any way > The fail count for this action was reset. actions dont have failcounts.... resources do, so again, i'm not 100% sure what you're talking about here > Is the fail for start action special? not in any way that would result in it being incremented if thats what you mean _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
