Hi Michael, HI group, I expanded my configuration and added the attributes resource-failure-stickiness and resource-stickiness to one resource of the "bar group" . I used "-30" for resource-failure-stickiness and "100" for resource-stickiness. If I correctly understand heartbeat's principles, the resource should be started on the 2nd node after 3 failures on the 1st node (the 4th problem with the resource will make it switch).
To simulate a problem with the resource, I stopped the service and moved the binary, so that the monitor detects that the service is not running, but can not start the service again. Heartbeat start the other resources in the resource group, is not able to start the resource with the moved binary twice.... and does nothing! It does not switch. I tried to use crm_failcount to see if heartbeat has increased the failure counter, but it seems it has not: crm_failcount -G -U server-1.domain.com -r resource_orbd gives me name=fail-count-obs_group value=(null) Error performing operation: The object/attribute does not exist Do I have to configure Stonith if I want to use these stickiness values? Do I miss something? Best regards, Jason 2009/2/9 Michael Schwartzkopff <[email protected]>: > Am Montag, 9. Februar 2009 16:08:44 schrieb J. Friedrich: >> Hi everyone, >> >> many thanks to Yan Gao for his fast answer of my HB Gui related >> questions. Now, after the initial configuration with the GUI, I have >> some problem with the fine tuning. >> >> The following scenario: >> >> We defined two resource groups, one groups consists of a virtual ip >> address, an apache web server and an application (the foo group). The >> other one consists a virtual ip address and an application (the bar >> group). We want that, if any member of the resource fails, the >> resource group is switched to the 2nd server (server-2.domain.com) and >> the 1st server (server-1.domain.com) is marked with a constraint so >> that the service will not be switched to the 1st node, until someone >> has checked the 2nd server. > > heartbeat: meta-attribute resource-failure-stickiness adds these points times > failcounter the a ressources balance. The node with the higest pioints wins. > > pacemaker: meta-attribute migration-threshold will be directly compared to the > failcounter of the resource. no messing with points any more. > >> We tried to set up a monitoring process, with prereq "nothing" and >> on_fail "fence". But it does not work as expected. When we kill the >> application on the first node, it is switched to the 2nd server. > > Should have alredy fenced the 1st node during the first error. i.e. switched > off. Did you define a working (!) STONITH resource? > >> So >> far, so good. But when we stop the application on the 1st server >> (while the service is down on the 2nd server), heartbeat stops both >> resource groups, the whole foo and the whole bar resource group, and >> moves all services to the first node, even though no constraint is >> defined an no dependency can be found between the two resource groups. > > > >> I know that node fencing is implemented via STONITH, and that is not >> what we want. Neither do we want to restart the complete node, if a >> service or resource group fails. >> >> Hope that someone can resolve this mystery for me. I attached the >> cib.xml for further analysis. > > Fencing is always done via STONITH. If you just want to have the nodes obey > the failcounter use resource_failure_stickiness or migration_threshold. > > Greetings, > -- > Dr. Michael Schwartzkopff > MultiNET Services GmbH > Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany > Tel: +49 - 89 - 45 69 11 0 > Fax: +49 - 89 - 45 69 11 21 > mob: +49 - 174 - 343 28 75 > > mail: [email protected] > web: www.multinet.de > > Sitz der Gesellschaft: 85630 Grasbrunn > Registergericht: Amtsgericht München HRB 114375 > Geschäftsführer: Günter Jurgeneit, Hubert Martens > > --- > > PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B > Skype: misch42 > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
