Hi, On Thu, Sep 06, 2007 at 06:47:16PM +0200, FG wrote: > Hi, > > I use heartbeat 2.1.1 in an active/passive configuration. > > I'am testing differents failover and need some explanations: > > My node are castor (active) and pollux (standby). > > I'm testing the process failover with monitoring. My configuration use > default_stickiness = "200" and default_failure_stickiness ="-200" and as > constraint rsc_location castor with a score of "200". > With these options, i can have 5 process failures before all services > can failover to castor. > > It goes as a charm... :-) > > The score on castor decrease from 1000 (4 resources x 200 + > score_constraint 200) to 0 and with the sixth failure, failover. > The scores after failover are: castor (-1000) and pollux (800). > [EMAIL PROTECTED] crm]# ptest -L -VVVVVVVVVVVVVVVVVVVVV 2>&1|grep assign > ptest[31985]: 2007/09/06_15:57:25 debug: debug5: do_calculations: assign > nodes to colors > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > IPaddr_147_210_36_7, Node[0] pollux: 800 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > IPaddr_147_210_36_7, Node[1] castor: -1000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > pollux to IPaddr_147_210_36_7 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > Filesystem_2, Node[0] pollux: 1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > Filesystem_2, Node[1] castor: -1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > pollux to Filesystem_2 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > cyrus-imapd_3, Node[0] pollux: 1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > cyrus-imapd_3, Node[1] castor: -1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > pollux to cyrus-imapd_3 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > saslauthd_4, Node[0] pollux: 1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > saslauthd_4, Node[1] castor: -1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > pollux to saslauthd_4 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > pingd-child:0, Node[0] castor: 1 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > pingd-child:0, Node[1] pollux: 0 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > castor to pingd-child:0 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > pingd-child:1, Node[0] pollux: 1 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Color > pingd-child:1, Node[1] castor: -1000000 > ptest[31985]: 2007/09/06_15:57:25 debug: native_assign_node: Assigning > pollux to pingd-child:1 > > Now to test, I unplug the network card on pollux. I thought then to > have a new failover to the first node (castor) but nothing... > So i watch my score and my log > > [EMAIL PROTECTED] crm]# ptest -L -VVVVVVVVVVVVVVVVVVVVV 2>&1|grep assign > ptest[32467]: 2007/09/06_16:17:11 debug: debug5: do_calculations: assign > nodes to colors > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > IPaddr_147_210_36_7, Node[0] castor: -1000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > IPaddr_147_210_36_7, Node[1] pollux: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes > for resource IPaddr_147_210_36_7 are unavailable, unclean or shutting down > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > Filesystem_2, Node[0] castor: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > Filesystem_2, Node[1] pollux: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes > for resource Filesystem_2 are unavailable, unclean or shutting down > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > cyrus-imapd_3, Node[0] castor: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > cyrus-imapd_3, Node[1] pollux: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes > for resource cyrus-imapd_3 are unavailable, unclean or shutting down > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > saslauthd_4, Node[0] castor: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > saslauthd_4, Node[1] pollux: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: All nodes > for resource saslauthd_4 are unavailable, unclean or shutting down > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > pingd-child:0, Node[0] castor: 1 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > pingd-child:0, Node[1] pollux: 0 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Assigning > castor to pingd-child:0 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > pingd-child:1, Node[0] pollux: 1 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Color > pingd-child:1, Node[1] castor: -1000000 > ptest[32467]: 2007/09/06_16:17:12 debug: native_assign_node: Assigning > pollux to pingd-child:1 > > pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource > IPaddr_147_210_36_7 cannot run anywhere > pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource > Filesystem_2 cannot run anywhere > pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource > cyrus-imapd_3 cannot run anywhere > pengine[20890]: 2007/09/06_16:00:23 WARN: native_color: Resource > saslauthd_4 cannot run anywhere > > Could someone explain me what's happening ? Is that split-brain ???
Yes, it is. > Because of pingd failed,and my rule to score="-INFINITY", i think scores > on pollux are logics, aren't it ? And finally we have the same score for > resources on the two nodes > How can i avoid this behavior ? The cluster won't try to run the resources on a node which has negative score, i.e. one on which the resource failed too many times. That's your case it seems. Try to reset the failcount and see if that helps. Thanks. Dejan > / > /I attach my settings (cibadmin -Q in a normal state), would you please > help to verify it ? > > Thanks, regards > > Fabrice > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
