Re: [Linux-HA] Best effort HA

Yan Fitterer Tue, 08 May 2007 04:15:02 -0700

Comments in-line

>>> On Tue, May 8, 2007 at 10:13 AM, in message <[EMAIL PROTECTED]>,
Kai Bjørnstad <[EMAIL PROTECTED]> wrote: 
> Unfortunately the the environment the ha server/cluster I am trying to 
> configure does not really fit with a grouping of IP/filsystem/lsb.
> In short: All the LSB services should be available on the same IP and there 
> is 
> not necessarily a mapping between the filesystems and the LSB script (so I 
> just have to play it safe) 
> 
> 
> The ruleset is not that complicated really, it's just a lot of them :-)
> 
> - The IP group has co-location and order on
> - The Filesystem and LSB group has co-location on and order off
> 
> - Colocation between all Filesystems to the IP-group
> - Colocation between all LSB scripts to the IP-group
> - Colocation between all LSB scrpts and all Filesystems
> 
> - Startorder from all LSB scripts to all Filesystems (this to enable 
> restart)
> - Startorder between the groups: IP-group before Filesystems-group before 
> LSB-group


To me, this looks like you need _everything_ in a single group...

> 
> What I still do not understand is that a failed LSB-script does not trigger 
> a 
> failover?? Neither does a failed Filesystem (it only stops all LSB-scripts).
> Only a failed IP trigger a failover.

Don't forget: In a large dependency "group", when one resource wants to move, 
many others are trying to stay, depending
on your default-resource-stickiness and default-resource-failure-stickiness.

To understand what's going on, you need to get some insight on the scores 
allocates to each node / resource by the
Policy Engine.

On a running configuration (i.e the currently running configuration), you can 
get that (or at least some of it, not
sure...) with crm_verify -L -VVVVVV (possibly more Vs required).

To analyze retrospectively how decisions were made on a given transition, use 
ptest on the relevant file from
/var/lib/heartbeat/pengine. Again, you'll need to crank up the verbosity.

> 
> Does this have anything to do with the "stickiness stuff"?? I  have
> default-resource-stickiness = "100"
> default-resource-failure-stickiness = "-INFINITY"

Both those will play a role (in particular the -failure- one (in this case, it 
means failed services should move at
first failure.

> 
> 
> 
> On Monday 07 May 2007 18:09:48 Yan Fitterer wrote:
>> Haven't looked at too much detail (lots of resources / constraints in
>> your cib...), but I would approach the problem differently:
>>
>> Make groups out of related IP / filesystem / service stacks.
>>
>> Then use the colocation constraints between services (across groups) to
>> force things to move together (if it is indeed what you are trying to
>> achieve).
>>
>> As well, I would start with maybe less resources, to make
>> experimentation and troubleshooting easier...
>>
>> What you describe below would seem broadly possible to me.
>>
>> My 2c
>>
>> Yan
>>
>> Kai Bjørnstad wrote:
>> > Hi,
>> >
>> > I am trying to setup an Active-Passive HA cluster dong "best effort" with
>> > little success.
>> > I am using Heartbeat 2.0.8
>> >
>> > I have a set of IP resources, a set of external (iSCSI) mount resources
>> > and a set of LSB script resources.
>> >
>> > The goal of the configuration is to make Heartbeat do the following:
>> > - All resources should run on the same node at all times
>> > - If one or more of the IPs go down on, move all resources to the backup
>> > node. If no backup node is available, shut everything down.
>> > - If one or more of the mounts go down, move all resources (including
>> > IPs) to the backup node. If no backup node is available shut down all the
>> > LSB scripts and the failed mounts. Keep the mounts and IPs that did not
>> > fail up. - If one or more of the LSB scripts fail, move all resources to
>> > the backup node (including mounts and IPs). If the no backup node is
>> > available shut down the failed LSB script(s) but keep all other resoruces
>> > running (best effort) - Of course local restart should be attempted
>> > before moving to backup node. - Start IPs and Mounts before the LSB
>> > scripts
>> > - Start/restart order of IPs should not be enforced
>> > - Start/restart order of Mounts should not be enforced
>> > - Start/restart order of LSBs should not be enforced
>> >
>> > My question is basically: Is this at all possible???
>> >
> 
> 
> -- 
> Kai R. Bjrnstad
> Senior Software Engineer
> dir. +47 22 62 89 43
> mob. +47 99 57 79 11
> tel. +47 22 62 89 50
> fax. +47 22 62 89 51
> [EMAIL PROTECTED]
> 
> Olaf Helsets vei 6
> N0621 Oslo, Norway
> 
> Scali - www.scali.com
> Scaling the Linux Datacenter
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Best effort HA

Reply via email to