Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-12 Thread Peter Kruse

Hi,

Andrew Beekhof wrote:


i ran ptest and it wants to start fence1:1 and fence2:1

the CRM probably just needs a little poke to rerun the PE.
try: crm_attribute -n last_cleanup -v `date -r`


ah!  that did the trick, but I had to use `date -R` ;)



i cleaned this up for 2.0.6 earlier this week... the problem is that
-C results in a delete in the status section which is problematic to
detect reliably (you'll get *way* more false positives that true
hits).

so in .6 crm_resource does the equivalent of the above command 
automatically.


Very good, I will add it to my script then.

Best regards,

Peter
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-11 Thread Andrew Beekhof

On 5/10/06, Peter Kruse [EMAIL PROTECTED] wrote:

Hi,

Andrew Beekhof wrote:
 On 5/9/06, Peter Kruse [EMAIL PROTECTED] wrote:

 although cibadmin -Ql -o status does not show the failed resource
 anymore.  How can I recover from this situation?


 cib contents?

Oh, thanks for reminding me (I should know by now...)
attached is output of cibadmin -Q before I ran the commands
and after I ran the commands (also attached).  crm_mon still
reports this:

Clone Set: DoFencing_fence1
 fence1:0(stonith:external/apc): Started ha-test-2
 fence1:1(stonith:external/apc): Stopped
Clone Set: DoFencing_fence2
 fence2:0(stonith:external/apc): Started ha-test-2
 fence2:1(stonith:external/apc): Stopped

Although the status should have been cleared.

Regards,

Peter


crm_resource -C -r rg2:IPaddr2 -t primitive -H ha-test-1
crm_resource -C -r rg2:IPaddr2 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1
crm_resource -C -r rg1:IPaddr3 -t primitive -H ha-test-1


i ran ptest and it wants to start fence1:1 and fence2:1

the CRM probably just needs a little poke to rerun the PE.
try: crm_attribute -n last_cleanup -v `date -r`

i cleaned this up for 2.0.6 earlier this week... the problem is that
-C results in a delete in the status section which is problematic to
detect reliably (you'll get *way* more false positives that true
hits).

so in .6 crm_resource does the equivalent of the above command automatically.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-10 Thread Andrew Beekhof

On 5/9/06, Peter Kruse [EMAIL PROTECTED] wrote:

Hi,

Andrew Beekhof wrote:

 if you want a list of failed resources: crm_mon -1 | grep failed

 if you just want the lrm_rsc_op's that failed, look for rc_code != 0
  rc_code != 7 (where 7 is LSB for Safely Stopped) in the result of
 cibadmin -Ql -o status

Is that also true for fencing resources?  If I disconnect the network
from one node where the powerswitch is attached, crm_mon -1 prints:

Clone Set: DoFencing_fence1
 fence1:0(stonith:external/apc): Started ha-test-2
 fence1:1(stonith:external/apc): Stopped
Clone Set: DoFencing_fence2
 fence2:0(stonith:external/apc): Started ha-test-2
 fence2:1(stonith:external/apc): Stopped


but with these commands I cannot recover:

crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1

although cibadmin -Ql -o status does not show the failed resource
anymore.  How can I recover from this situation?



cib contents?
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-10 Thread Peter Kruse

Hi,

Andrew Beekhof wrote:

On 5/9/06, Peter Kruse [EMAIL PROTECTED] wrote:


although cibadmin -Ql -o status does not show the failed resource
anymore.  How can I recover from this situation?



cib contents?


Oh, thanks for reminding me (I should know by now...)
attached is output of cibadmin -Q before I ran the commands
and after I ran the commands (also attached).  crm_mon still
reports this:

Clone Set: DoFencing_fence1
fence1:0(stonith:external/apc): Started ha-test-2
fence1:1(stonith:external/apc): Stopped
Clone Set: DoFencing_fence2
fence2:0(stonith:external/apc): Started ha-test-2
fence2:1(stonith:external/apc): Stopped

Although the status should have been cleared.

Regards,

Peter


cibadmin-Q.before.gz
Description: GNU Zip compressed data


cibadmin-Q.after.gz
Description: GNU Zip compressed data
crm_resource -C -r rg2:IPaddr2 -t primitive -H ha-test-1
crm_resource -C -r rg2:IPaddr2 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1
crm_resource -C -r rg1:IPaddr3 -t primitive -H ha-test-1
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] What happened to rsc_state?

2006-05-09 Thread Peter Kruse

Hello,

it seems that in 2.0.5 the attribute rsc_state to lrm_rsc_op has
disappeared. And has been replaced by rc_code and op_status.
But it is not the same.  In order to remove errors in the
cib, so that resources are started again, or nodes can take over
again, I used to do something like this:
Search in lrm_rsc_op for rsc_state=start_failed or
rsc_state=monitor_failed and clear it with:

crm_resource -C -r $resname -t primitive -H $hostname

But this is not the same now, as the return code can be
non-zero as a result of a probe action, which in this
case does not mean error.  So how can I find out which
resources I can safely delete from the lrm without
causing much harm?

Regards,

Peter
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-09 Thread Peter Kruse

Hi,

Andrew Beekhof wrote:


if you want a list of failed resources: crm_mon -1 | grep failed

if you just want the lrm_rsc_op's that failed, look for rc_code != 0
 rc_code != 7 (where 7 is LSB for Safely Stopped) in the result of
cibadmin -Ql -o status


Is that also true for fencing resources?  If I disconnect the network
from one node where the powerswitch is attached, crm_mon -1 prints:

Clone Set: DoFencing_fence1
fence1:0(stonith:external/apc): Started ha-test-2
fence1:1(stonith:external/apc): Stopped
Clone Set: DoFencing_fence2
fence2:0(stonith:external/apc): Started ha-test-2
fence2:1(stonith:external/apc): Stopped


but with these commands I cannot recover:

crm_resource -C -r DoFencing_fence1:fence1:1 -t primitive -H ha-test-1
crm_resource -C -r DoFencing_fence2:fence2:1 -t primitive -H ha-test-1

although cibadmin -Ql -o status does not show the failed resource
anymore.  How can I recover from this situation?

Peter

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/