Thanks for the input on the debug cmds. Did not know about those.
Unfortunately I am not any smarter after browsing the output :-)
> Don't forget: In a large dependency "group", when one resource wants to
> move, many others are trying to stay, depending on your
> default-resource-stickiness and default-resource-failure-stickiness.
The only thing I see when looking at the debug output is comparisons of
INFINITY vs. - INFINITY ??
So my first "CIB design" was to put everything in one group. The problems with
this configuration is that the default start/restart order in a group is just
a linear list.
This means that when my group looks like this:
Group mygroup:
IP_1
IP_2
FS_1
FS_2
LSB_1
LSB_2
LSB_3
a failure of LSB_1 will make both LSB_2 and LSB_3 stop, but in reality they
are not dependent at all. The same problem i get if FS_1 fails, the LSB_x
should stop, but FS_2 should still be up!
As a consequence of this I wrote a new config disabling the group start order
but keeping the group co-location. and then explicitly setting the rules that
any/all LSB-script should start/restart after any/all Filesystem
I set
default-resource-stickiness = "0"
default-resource-failure-stickiness = "0"
This is when the magic begins:
When i now start up my passive system (dl360g3-2) I suddenly get resources
distributed all over the place! But the group still has co-location on !??
And everything was up an ok on the primary !?
Resource Group: HAsms
ip_172.19.5.200 (heartbeat::ocf:IPaddr):
Started dl360g3-2
ip_11.0.0.200 (heartbeat::ocf:IPaddr):
Started dl360g3-1
mount_opt_scali_var_scacim-db (heartbeat::ocf:Filesystem):
Started dl360g3-2
mount_opt_scali_var_scasmo-db (heartbeat::ocf:Filesystem):
Started dl360g3-1
mount_opt_scali_repository (heartbeat::ocf:Filesystem):
Started dl360g3-2
mount_tftpboot (heartbeat::ocf:Filesystem):
Started dl360g3-1
mount_opt_scali_images (heartbeat::ocf:Filesystem):
Started dl360g3-2
mount_var_consoles (heartbeat::ocf:Filesystem):
Started dl360g3-1
lsb_scacim-pgsql (lsb:scacim-pgsql):
Started dl360g3-2
lsb_dhcpd (lsb:dhcpd):
Started dl360g3-1
lsb_scasmo-controller (lsb:scasmo-controller):
Started dl360g3-2
lsb_conserver (lsb:conserver):
Started dl360g3-1
lsb_scasmo-factory (lsb:scasmo-factory):
Started dl360g3-2
lsb_scaproxyd (lsb:scaproxyd):
Started dl360g3-1
lsb_scamond-mapper (lsb:scamond-mapper):
Started dl360g3-2
lsb_scasmo-server (lsb:scasmo-server):
Started dl360g3-1
Then I set
default-resource-stickiness = "100"
default-resource-failure-stickiness = "0"
Now the resources stay on dl360g3-1 when i start up dl360g3-2, but when I make
the a LSB resource fail on dl360g3-1, Heartbeat suddenly moves the resource
to dl360g3-2, disregarding the group co-location rules ?????
cheers Kai
On Tuesday 08 May 2007 12:48:26 Yan Fitterer wrote:
> Comments in-line
>
> >>> On Tue, May 8, 2007 at 10:13 AM, in message
> >>> <[EMAIL PROTECTED]>,
> Don't forget: In a large dependency "group", when one resource wants to
> move, many others are trying to stay, depending on your
> default-resource-stickiness and default-resource-failure-stickiness.
--
Kai R. Bj�rnstad
Senior Software Engineer
dir. +47 22 62 89 43
mob. +47 99 57 79 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51
[EMAIL PROTECTED]
Olaf Helsets vei 6
N0621 Oslo, Norway
Scali - www.scali.com
Scaling the Linux Datacenter
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems