Re: [Linux-HA] Problem understanding resource fencing

J. Friedrich Wed, 11 Feb 2009 07:11:33 -0800

Hi Michael,
HI group,

I expanded my configuration and added the attributes
resource-failure-stickiness and resource-stickiness to one resource of
the "bar group" . I used "-30" for resource-failure-stickiness and
"100" for resource-stickiness. If I correctly understand heartbeat's
principles, the resource should be started on the 2nd node after 3
failures on the 1st node (the 4th problem with the resource will make
it switch).


To simulate a problem with the resource, I stopped the service and
moved the binary, so that the monitor detects that the service is not
running, but can not start the service again. Heartbeat start the
other resources in the resource group, is not able to start the
resource with the moved binary twice.... and does nothing! It does not
switch.

I tried to use crm_failcount to see if heartbeat has increased the
failure counter, but it seems it has not:

crm_failcount -G -U server-1.domain.com -r resource_orbd

gives me

name=fail-count-obs_group value=(null)
Error performing operation: The object/attribute does not exist

Do I have to configure Stonith if I want to use these stickiness values?
Do I miss something?

Best regards,
Jason


2009/2/9 Michael Schwartzkopff <[email protected]>:
> Am Montag, 9. Februar 2009 16:08:44 schrieb J. Friedrich:
>> Hi everyone,
>>
>> many thanks to Yan Gao for his fast answer of my HB Gui related
>> questions. Now, after the initial configuration with the GUI, I have
>> some problem with the fine tuning.
>>
>> The following scenario:
>>
>> We defined two resource groups, one groups consists of a virtual ip
>> address, an apache web server and an application (the foo group). The
>> other one consists a virtual ip address and an application (the bar
>> group). We want that, if any member of the resource fails, the
>> resource group is switched to the 2nd server (server-2.domain.com) and
>> the 1st server (server-1.domain.com) is marked with a constraint so
>> that the service will not be switched to the 1st node, until someone
>> has checked the 2nd server.
>
> heartbeat: meta-attribute resource-failure-stickiness adds these points times
> failcounter the a ressources balance. The node with the higest pioints wins.
>
> pacemaker: meta-attribute migration-threshold will be directly compared to the
> failcounter of the resource. no messing with points any more.
>
>> We tried to set up a monitoring process, with prereq "nothing" and
>> on_fail "fence". But it does not work as expected. When we kill the
>> application on the first node, it is switched to the 2nd server.
>
> Should have alredy fenced the 1st node during the first error. i.e. switched
> off. Did you define a working (!) STONITH resource?
>
>> So
>> far, so good. But when we stop the application on the 1st server
>> (while the service is down on the 2nd server), heartbeat stops both
>> resource groups, the whole foo and the whole bar resource group, and
>> moves all services to the first node, even though no constraint is
>> defined an no dependency can be found between the two resource groups.
>
>
>
>> I know that node fencing is implemented via STONITH, and that is not
>> what we want. Neither do we want to restart the complete node, if a
>> service or resource group fails.
>>
>> Hope that someone can resolve this mystery for me. I attached the
>> cib.xml for further analysis.
>
> Fencing is always done via STONITH. If you just want to have the nodes obey
> the failcounter use resource_failure_stickiness or migration_threshold.
>
> Greetings,
> --
> Dr. Michael Schwartzkopff
> MultiNET Services GmbH
> Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
> Tel: +49 - 89 - 45 69 11 0
> Fax: +49 - 89 - 45 69 11 21
> mob: +49 - 174 - 343 28 75
>
> mail: [email protected]
> web: www.multinet.de
>
> Sitz der Gesellschaft: 85630 Grasbrunn
> Registergericht: Amtsgericht München HRB 114375
> Geschäftsführer: Günter Jurgeneit, Hubert Martens
>
> ---
>
> PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
> Skype: misch42
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem understanding resource fencing

Reply via email to