[Linux-ha-dev] Re: [Linux-HA] APC SNMP STONITH

2007-09-18 Thread Peter Kruse
Hello, Philip Gwyn wrote: As discussed earlier, I'm writing a new SNMP STONITH plugin. The goal is for it to seamlessly work with the new and old MIBs (AP9606 vs AP7900). Ok, the old apcmastersnmp needed work, right. Instead of fixing the current apcmastersnmp.c, I started over from

Re: [Linux-ha-dev] new apc firmware breaks apcmastersnmp.so

2007-04-05 Thread Peter Kruse
Hello, Alan Robertson wrote: Dave Blaschke wrote: Also, is there some way to determine what firmware is on the APC and then pass the appropriate OID_ constant? This plugin must work for some folks (at least the original author anyway ;-) so these changes would probably break folks who are

Re: [Linux-ha-dev] new apc firmware breaks apcmastersnmp.so

2007-04-04 Thread Peter Kruse
Hi Dave, Dave Blaschke wrote: I cannot find the Config info syntax: message in the latest or any of the most recent 2.0.x code - what version of heartbeat are you using? Oops, yes that was an old version, but that doesn't make a difference concerning the oids. Regardless, you should get a

[Linux-ha-dev] new apc firmware breaks apcmastersnmp.so

2007-04-03 Thread Peter Kruse
Hello, with the v3 firmware of APCs PDUs (models AP7920 and AP7921 at least) the apcmastersnmp.so plugin to stonith does not work anymore. in apcmastersnmp.c there is: #define OID_IDENT .1.3.6.1.4.1.318.1.1.4.1.4.0 #define OID_NUM_OUTLETS.1.3.6.1.4.1.318.1.1.4.4.1.0

Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-12 Thread Peter Kruse
Hi, Andrew Beekhof wrote: i ran ptest and it wants to start fence1:1 and fence2:1 the CRM probably just needs a little poke to rerun the PE. try: crm_attribute -n last_cleanup -v `date -r` ah! that did the trick, but I had to use `date -R` ;) i cleaned this up for 2.0.6 earlier this

Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-10 Thread Peter Kruse
Hi, Andrew Beekhof wrote: On 5/9/06, Peter Kruse [EMAIL PROTECTED] wrote: although cibadmin -Ql -o status does not show the failed resource anymore. How can I recover from this situation? cib contents? Oh, thanks for reminding me (I should know by now...) attached is output of cibadmin

[Linux-ha-dev] What happened to rsc_state?

2006-05-09 Thread Peter Kruse
Hello, it seems that in 2.0.5 the attribute rsc_state to lrm_rsc_op has disappeared. And has been replaced by rc_code and op_status. But it is not the same. In order to remove errors in the cib, so that resources are started again, or nodes can take over again, I used to do something like this:

Re: [Linux-ha-dev] What happened to rsc_state?

2006-05-09 Thread Peter Kruse
Hi, Andrew Beekhof wrote: if you want a list of failed resources: crm_mon -1 | grep failed if you just want the lrm_rsc_op's that failed, look for rc_code != 0 rc_code != 7 (where 7 is LSB for Safely Stopped) in the result of cibadmin -Ql -o status Is that also true for fencing resources?

Re: [Linux-ha-dev] File descriptor left open

2006-02-14 Thread Peter Kruse
Hello, Alan Robertson wrote: Do you have any idea where this message is coming from? Hm, no, they are from lrmd? When I started v2.0.3 yesterday there came these messages: Feb 13 17:08:02 ha-test-1 lrmd: [5296]: info: RA output: (rg1:fraid0:start:stderr) File descriptor 3 left open Feb

Re: [Linux-ha-dev] File descriptor left open

2006-02-13 Thread Peter Kruse
is still open? Peter Peter Kruse wrote: Hello, In my logs I get these messages like this: Feb 7 18:23:57 ha-test-1 lrmd: [2000]: info: RA output: (rg1:fpbs1:start:stderr) Filedescriptor 3 left open File descriptor 4 left open File descriptor 5 left open File descriptor 6 left open File

[Linux-ha-dev] File descriptor left open

2006-02-08 Thread Peter Kruse
Hello, In my logs I get these messages like this: Feb 7 18:23:57 ha-test-1 lrmd: [2000]: info: RA output: (rg1:fpbs1:start:stderr) Filedescriptor 3 left open File descriptor 4 left open File descriptor 5 left open File descriptor 6 left open File descriptor 7 left open File descriptor 8

Re: [Linux-ha-dev] Re: [Linux-ha-cvs] Linux-HA CVS: crm by andrew from

2006-02-05 Thread Peter Kruse
Good Morning, Huang Zhen wrote: It looks that the code deems the HA_CCMUID as group id and HA_APIGID as user id. Right, I just stumbled across that problem, too, The error message is: ERROR: mask(io.c:readCibXmlFile): /var/lib/heartbeat/crm/cib.xml must be owned and read/writeable by user 17,

Re: [Linux-ha-dev] Tracking 2.0.3 release

2006-01-20 Thread Peter Kruse
Hello, Lars Marowsky-Bree wrote: On 2006-01-20T10:03:53, Andrew Beekhof [EMAIL PROTECTED] wrote: Woah, what are you calling crm_attribute for all the time? Its either an ipfail replacement or his way of getting resources to run on the node where they've failed the least... I

Re: [Linux-ha-dev] problem with some RA (output: cat: write error: Broken pipe)

2006-01-16 Thread Peter Kruse
Hi, Francis Montagnac wrote: I think it would be better to only reset SIGPIPE to SIG_DFL (perhaps also other signals) in the LRM just before exec'ing any external (ie: not pertaining to heartbeat itself) commands like the RA's. Is that hard to do? Or has somebody already done so? Should I

Re: [Linux-ha-dev] problem with some RA (output: cat: write error: Broken pipe)

2006-01-16 Thread Peter Kruse
Hello, Anyway, I donnot test it yet, so not sure if it's really the fixing for your issue. Could you please test it and post the result to the mailing list? TIA! Yes, the problem is gone, there are no more messages like that in syslog. Great! Peter

[Linux-ha-dev] problem with some RA (output: cat: write error: Broken pipe)

2006-01-12 Thread Peter Kruse
Hello, In one of my RAs there is a line like this: ( exportfs ; cat /proc/fs/nfs/exports ) | grep -q ^${export_dir}[ ] This line apparently produces these errors: Jan 12 13:40:08 ha-test-1 lrmd: [16217]: info: RA output: (rg1:nfs1:monitor:stderr) cat: Jan 12 13:40:08 ha-test-1 lrmd: