Edward Clay wrote:
I am looking for some help with the external/riloe stonith plug in. I have been working with the one that ships in SLES 10 SP1 heartbeat 2.0.8-0.19. I have used the following XML to create the clone resource. <clone id="CL_stonithset_node1"> <instance_attributes id="CL_stonithset_node1"> <attributes> <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> </attributes> </instance_attributes> <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> <operations> <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> <op name="start" timeout="60s" id="CL_stonith_node1_start"/> </operations> <instance_attributes id="CL_stonith_node1"> <attributes> <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> <nvpair id="CL_stonith_node1_RI_HOSTRI" name="RI_HOSTRI" value="il-node1"/> <nvpair id="CL_stonith_node1_RI_LOGIN" name="RI_LOGIN" value="Administrator"/> <nvpair id="CL_stonith_node1_RI_PASSWORD" name="RI_PASSWORD" value="password"/> </attributes> </instance_attributes> </primitive> </clone> Sample errors in the messages log. Jun 27 11:30:41 node1 haclient: on_event:evt:cib_changed Jun 27 11:30:41 node1 stonithd: [5318]: info: Cannot get parameter hostname from StonithNVpair Jun 27 11:30:41 node1 stonithd: [5318]: ERROR: Invalid config info for external/riloe device. Jun 27 11:30:41 node1 lrmd: [12035]: ERROR: sending stonithRA op to stonithd failed. Jun 27 11:30:41 node1 cib: [12048]: info: write_cib_contents: Wrote version 0.46.2095 of the CIB to disk
This problem has already been fixed by Novell bug 266551 so you should be able to get the fix from them (sorry, can't be any help there). A less-preferred but quicker alternative (only in a non-production environment) is to apply the patch at http://hg.linux-ha.org/dev/rev/48477653f995 directly to your system to see if you get farther.
This error shows up a couple of times in a row also. Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: couldnt find attr_name Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: ="ilo_hostname" uniq Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: <longdesc lang=en Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: > <parameter name=" Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error parsing token: error parsing child Jun 27 11:30:41 node1 crmd: [5320]: ERROR: parse_xml: Error at or before: c> <parameters> <pa Jun 27 11:30:41 node1 crmd: [5320]: ERROR: crm_abort: find_xml_node: Triggered non-fatal assert at xml.c:75 : root != NULL The resource is created OK but I can't start the resource. It gives an error that it can't run anywhere. I also see errors about not being able to fin hostname. So I did some digging in the riloe file and it shows the RI_ entries as legacy. lower it in the file it shows some ilo_ values. So I tried creating the same file above with the new ilo equivalents.


<clone id="CL_stonithset_node1"> <instance_attributes id="CL_stonithset_node1"> <attributes> <nvpair id="CL_stonithset_node1_clone_node_max" name="clone_node_max" value="1"/> </attributes> </instance_attributes> <primitive id="CL_stonith_node1" class="stonith" type="external/riloe" provider="heartbeat"> <operations> <op name="monitor" interval="30s" timeout="20s" id="CL_stonith_node1_monitor"/> <op name="start" timeout="60s" id="CL_stonith_node1_start"/> </operations> <instance_attributes id="CL_stonith_node1"> <attributes> <nvpair id="CL_stonith_node1_hostlist" name="hostlist" value="node1"/> <nvpair id="CL_stonith_node1_ilo_hostname" name="ilo_hostname" value="il-node1"/> <nvpair id="CL_stonith_node1_ilo_user" name="ilo_user" value="Administrator"/> <nvpair id="CL_stonith_node1_ilo_password" name="ilo_password" value="password"/> <nvpair id="CL_stonith_node1_ilo_protocol" name="ilo_protocol" value="1.2"/> </attributes> </instance_attributes> </primitive> </clone> Same results resource is created but doesn't start. I can ping the hostname and the ilo hostname of node1 and il-node1 from all boxes. I am able to ssh and https to the ilo card and login with the admin account. I have attached the riloe plug in that I am trying to use. The hardware is a dl350 running ilo firmware 1.22. Does anyone know what type of connection the plug in makes to the ilo card? Do I need to have the ilo2 device at a certain firmware version? Do I need a driver loaded for the ilo card to work or does it communicate to it through ssh or https?
HTTPS
What can I do to trouble shoot this problem?
You can add "debug 1" to your ha.cf or, for even more detailed debug info, run with the stonith command and -d option:

export RI_HOSTRI=il-node1
::
stonith -d -t external/riloe hostlist=node1 -S

to verify your stonith environment.

TIA Edward ------------------------------------------------------------------------

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to