Re: [Linux-HA] resource unmanaged/failed

Dejan Muhamedagic Thu, 08 Dec 2011 08:32:20 -0800

Hi,
On Wed, Dec 07, 2011 at 04:56:31PM +0600, Aleksey V. Kashin wrote:
> Hello.
> 
> I have two servers (radius1, radius2). I've set up the cluster resource 
> - IPaddr2. I used next commands to set up this resource:
> 
> # crm configure property stonith-enabled="false"


For a 2-node cluster disabling stonith is really bad.

> # crm configure property no-quorum-policy="ignore"
> # crm configure primitive raddb_ip ocf:heartbeat:IPaddr2 params 
> ip="10.99.2.57" cidr_netmask="32" op monitor interval="15s"
> # crm configure group raddb raddb_ip
> # crm configure location raddb-prefers-radius1 raddb inf: radius1
> # crm configure rsc_defaults resource-stickiness=1000001
> 
> All ok.
> 
> But sometimes on server radius1 the load is increasing and server is 
> swapping and at that moment resource becomes "(unmanaged) FAILED". Below 
> I've presented example "unmanaged" resource:
> 
> # crm_mon
> ============
> Last updated: Wed Dec  7 14:56:20 2011
> Stack: openais
> Current DC: radius1 - partition with quorum
> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> ============
> 
> Online: [ radius2 radius1 ]
> 
>   Resource Group: raddb
>       raddb_ip   (ocf::heartbeat:IPaddr2):       Started radius1 
> (unmanaged) FAILED
> 
> Failed actions:
>      raddb_ip_monitor_15000 (node=radius1, call=4, rc=-2, status=Timed 
> Out): unknown exec error
>      raddb_ip_stop_0 (node=radius1, call=5, rc=-2, status=Timed Out): 
> unknown exec error
> 
> 
> I've presented part of /var/log/syslog (radius1) here - 
> http://paste.org/41963
> 
> 
> In that moment ip address 10.99.2.57 is alive and server responds to 
> requests coming to this ip. However sometimes this resource becomes 
> completely unavailable and I restart corosync. It's very bad.
> 
> I think resource becomes unmanaged because server is using swap and part 
> of corosync processes is in swap. I tested this suggestion and when 
> server is using a lot of swap resource becomes "unmanaged".

corosync gets swapped? How interesting.

> I use debian gnu/linux 5.x and this packages - 
> http://people.debian.org/~madkiss/ha/:
> 
> # dpkg -l |grep cluster
> ii  cluster-glue                                      
> 1.0.7+hg2618-2~bpo50+1          The reusable cluster components for Linux HA
> ii  corosync                                          
> 1.4.2-1~bpo50+1                 Standards-based cluster framework (daemon an
> ii  libcluster-glue                                   
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries (transitional pac
> ii  libcorosync4                                      
> 1.4.2-1~bpo50+1                 Standards-based cluster framework (libraries
> ii  libcrmcluster1                                    
> 1.1.5-3~bpo50+1                 Pacemaker libraries - CRM
> ii  liblrm2                                           
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- liblrm2
> ii  libpils2                                          
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libpils2
> ii  libplumb2                                         
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libplumb2
> ii  libplumbgpl2                                      
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libplumbgpl2
> ii  libstonith1                                       
> 1.0.7+hg2618-2~bpo50+1          Reusable cluster libraries -- libstonith1
> ii  pacemaker                                         
> 1.1.5-3~bpo50+1                 HA cluster resource manager
> 
> 
> 
> I can't increase ram on this servers. How can I do that resource isn't 
> becomes "unmanaged/failed" ?

Buy more memory. If you cannot, then I don't see any point in
using clustering.

Thanks,

Dejan


> With Best Regards.
> Aleksey V. Kashin
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] resource unmanaged/failed

Reply via email to