Comments in-line >>> On Tue, May 8, 2007 at 10:13 AM, in message <[EMAIL PROTECTED]>, Kai Bjørnstad <[EMAIL PROTECTED]> wrote: > Unfortunately the the environment the ha server/cluster I am trying to > configure does not really fit with a grouping of IP/filsystem/lsb. > In short: All the LSB services should be available on the same IP and there > is > not necessarily a mapping between the filesystems and the LSB script (so I > just have to play it safe) > > > The ruleset is not that complicated really, it's just a lot of them :-) > > - The IP group has co-location and order on > - The Filesystem and LSB group has co-location on and order off > > - Colocation between all Filesystems to the IP-group > - Colocation between all LSB scripts to the IP-group > - Colocation between all LSB scrpts and all Filesystems > > - Startorder from all LSB scripts to all Filesystems (this to enable > restart) > - Startorder between the groups: IP-group before Filesystems-group before > LSB-group
To me, this looks like you need _everything_ in a single group... > > What I still do not understand is that a failed LSB-script does not trigger > a > failover?? Neither does a failed Filesystem (it only stops all LSB-scripts). > Only a failed IP trigger a failover. Don't forget: In a large dependency "group", when one resource wants to move, many others are trying to stay, depending on your default-resource-stickiness and default-resource-failure-stickiness. To understand what's going on, you need to get some insight on the scores allocates to each node / resource by the Policy Engine. On a running configuration (i.e the currently running configuration), you can get that (or at least some of it, not sure...) with crm_verify -L -VVVVVV (possibly more Vs required). To analyze retrospectively how decisions were made on a given transition, use ptest on the relevant file from /var/lib/heartbeat/pengine. Again, you'll need to crank up the verbosity. > > Does this have anything to do with the "stickiness stuff"?? I have > default-resource-stickiness = "100" > default-resource-failure-stickiness = "-INFINITY" Both those will play a role (in particular the -failure- one (in this case, it means failed services should move at first failure. > > > > On Monday 07 May 2007 18:09:48 Yan Fitterer wrote: >> Haven't looked at too much detail (lots of resources / constraints in >> your cib...), but I would approach the problem differently: >> >> Make groups out of related IP / filesystem / service stacks. >> >> Then use the colocation constraints between services (across groups) to >> force things to move together (if it is indeed what you are trying to >> achieve). >> >> As well, I would start with maybe less resources, to make >> experimentation and troubleshooting easier... >> >> What you describe below would seem broadly possible to me. >> >> My 2c >> >> Yan >> >> Kai Bjørnstad wrote: >> > Hi, >> > >> > I am trying to setup an Active-Passive HA cluster dong "best effort" with >> > little success. >> > I am using Heartbeat 2.0.8 >> > >> > I have a set of IP resources, a set of external (iSCSI) mount resources >> > and a set of LSB script resources. >> > >> > The goal of the configuration is to make Heartbeat do the following: >> > - All resources should run on the same node at all times >> > - If one or more of the IPs go down on, move all resources to the backup >> > node. If no backup node is available, shut everything down. >> > - If one or more of the mounts go down, move all resources (including >> > IPs) to the backup node. If no backup node is available shut down all the >> > LSB scripts and the failed mounts. Keep the mounts and IPs that did not >> > fail up. - If one or more of the LSB scripts fail, move all resources to >> > the backup node (including mounts and IPs). If the no backup node is >> > available shut down the failed LSB script(s) but keep all other resoruces >> > running (best effort) - Of course local restart should be attempted >> > before moving to backup node. - Start IPs and Mounts before the LSB >> > scripts >> > - Start/restart order of IPs should not be enforced >> > - Start/restart order of Mounts should not be enforced >> > - Start/restart order of LSBs should not be enforced >> > >> > My question is basically: Is this at all possible??? >> > > > > -- > Kai R. Bjrnstad > Senior Software Engineer > dir. +47 22 62 89 43 > mob. +47 99 57 79 11 > tel. +47 22 62 89 50 > fax. +47 22 62 89 51 > [EMAIL PROTECTED] > > Olaf Helsets vei 6 > N0621 Oslo, Norway > > Scali - www.scali.com > Scaling the Linux Datacenter > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
