Re: [Linux-HA] Best effort HA

Kai Bjørnstad Wed, 09 May 2007 02:09:32 -0700

Thanks for the input on the debug cmds. Did not know about those. 
Unfortunately I am not any smarter after browsing the output :-)


> Don't forget: In a large dependency "group", when one resource wants to
> move, many others are trying to stay, depending on your
> default-resource-stickiness and default-resource-failure-stickiness.
The only thing I see when looking at the debug output is comparisons of 
INFINITY vs. - INFINITY ??

So my first "CIB design" was to put everything in one group. The problems with 
this configuration is that the default start/restart order in a group is just 
a linear list. 
This means that when my group looks like this:

Group mygroup:
IP_1
IP_2
FS_1
FS_2
LSB_1
LSB_2
LSB_3

a failure of LSB_1 will make both LSB_2 and LSB_3 stop, but in reality they 
are not dependent at all. The same problem i get if FS_1 fails, the LSB_x 
should stop, but FS_2 should still be up!

As a consequence of this I wrote a new config disabling the group start order 
but keeping the group co-location. and then explicitly setting the rules that 
any/all LSB-script should start/restart after any/all Filesystem 

I set 
default-resource-stickiness = "0"
default-resource-failure-stickiness = "0"

This is when the magic begins:
When i now start up my passive system (dl360g3-2) I suddenly get resources 
distributed all over the place! But the group still has co-location on !?? 
And everything was up an ok on the primary !?

Resource Group: HAsms
    ip_172.19.5.200     (heartbeat::ocf:IPaddr):        
Started dl360g3-2
    ip_11.0.0.200       (heartbeat::ocf:IPaddr):        
Started dl360g3-1
    mount_opt_scali_var_scacim-db       (heartbeat::ocf:Filesystem):    
Started dl360g3-2
    mount_opt_scali_var_scasmo-db       (heartbeat::ocf:Filesystem):    
Started dl360g3-1
    mount_opt_scali_repository  (heartbeat::ocf:Filesystem):    
Started dl360g3-2
    mount_tftpboot      (heartbeat::ocf:Filesystem):    
Started dl360g3-1
    mount_opt_scali_images      (heartbeat::ocf:Filesystem):    
Started dl360g3-2
    mount_var_consoles  (heartbeat::ocf:Filesystem):   
 Started dl360g3-1
    lsb_scacim-pgsql    (lsb:scacim-pgsql):     
Started dl360g3-2
    lsb_dhcpd   (lsb:dhcpd):    
Started dl360g3-1
    lsb_scasmo-controller       (lsb:scasmo-controller):        
Started dl360g3-2
    lsb_conserver       (lsb:conserver):        
Started dl360g3-1
    lsb_scasmo-factory  (lsb:scasmo-factory):  
 Started dl360g3-2
    lsb_scaproxyd       (lsb:scaproxyd):        
Started dl360g3-1
    lsb_scamond-mapper  (lsb:scamond-mapper):   
Started dl360g3-2
    lsb_scasmo-server   (lsb:scasmo-server):    
Started dl360g3-1


Then I set 
default-resource-stickiness = "100"
default-resource-failure-stickiness = "0"

Now the resources stay on dl360g3-1 when i start up dl360g3-2, but when I make 
the a LSB resource fail on dl360g3-1, Heartbeat suddenly moves the resource 
to dl360g3-2, disregarding the group co-location rules ?????

cheers Kai 

On Tuesday 08 May 2007 12:48:26 Yan Fitterer wrote:
> Comments in-line
>
> >>> On Tue, May 8, 2007 at 10:13 AM, in message
> >>> <[EMAIL PROTECTED]>,

> Don't forget: In a large dependency "group", when one resource wants to
> move, many others are trying to stay, depending on your
> default-resource-stickiness and default-resource-failure-stickiness.



-- 
Kai R. Bj�rnstad
Senior Software Engineer
dir. +47 22 62 89 43
mob. +47 99 57 79 11
tel. +47 22 62 89 50
fax. +47 22 62 89 51
[EMAIL PROTECTED]

Olaf Helsets vei 6
N0621 Oslo, Norway

Scali - www.scali.com
Scaling the Linux Datacenter
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Best effort HA

Reply via email to