[ha-clusters-discuss] error message when enabling resource and/or monitor

Martin Rattner Mon, 08 Feb 2010 13:32:29 -0800

Corrado,

The error message implies that the Start method of resource rs1 failed 
-- either it exited non-zero, or time-out (failed to complete within the 
Start_timeout interval).  You say there are no errors in 
/var/adm/messages, but this is not expected.  Did you check the 
/var/adm/messages on every node?  You need to look on the node where the 
RG was online to see the start-failure messages.

[Note, the method that failed could be either the Start or Prenet_start 
method of rs1, depending on its resource type.]

The recovery action that is taken depends on the Failover_mode property 
of rs1.  If Failover_mode is set to Soft or Hard, the whole resource 
group will attempt to fail over to a different node; if no other node is 
available, the RG may try to restart on the same node.  If it succeded 
in restarting, it might give the false appearance that the resource 
started successfully on the first try.

If Failover_mode has any other value besided Soft or Hard, the resource 
group remains online on the same node but the resource moves to 
Start_failed state and the RG moves to Online_faulted state.  From your 
description, it does not sound like this is what happened.

You can examine the details of what happened by looking at 
/var/adm/messages on the node where the RG was online.  Also check for 
messages of the form "resource rs1 state on node xxx changed to yyy", 
which might appear on a different cluster node (the current RGM 
president node).

On 02/ 5/10 10:04 AM, Corrado Romano wrote:
> Hello,
>
> In a Sun Cluster 3.2 I have the following behavior:
> - if I enable the whole resource group sudo scswitch -Z -g <group name> 
> everything is correct
>
>
> - if I enable the resources one by one (e.g. resource rs1)  with the 
> following two commands in sequence: 
>
> scswitch -e -j rs1 
> the  resource starts correctly (normal messages in /var/adm/messages and no 
> errors) but I get this output:
> [i]scswitch: resource group failed to start on chosen node; it may end up 
> failing over to other node(s)[/i].No errors in the /var/adm/messages
>
> clrs monitor rs1
> the  monitor starts correctly (normal messages in /var/adm/messages and no 
> errors) but I get this output:
> [i]resource group failed to start on chosen node; it may end up failing over 
> to other node(s)[/i]
>
> Does somebody know what it can be the reason. I see this behavior for the 
> first time after working with other 5 clusters 3.2 which behaved normally
>
> Thx
> Corrado
>

[ha-clusters-discuss] error message when enabling resource and/or monitor

Reply via email to