Hi Alan,
I tend to agree with your analysis below. I placed some debug output in
the crm_attribute.c module, and found that it went into its loop within
the call to update_attr. I had trouble trying to get debug output I
placed in the function itself (is cib_attrs.c the correct module?). Then
on Friday I ran out of time to continue persuing it. 




On Sun, 2007-03-25 at 16:45 -0600, Alan Robertson wrote:
> Doug Knight wrote:
> > Got it. The attached file contains the strace from the second attempt by
> > heartbeat to start the resource up as master, right up until it was
> > killed. The resource already showed failed on the gui. I zipped it up
> > using gzip.
> 
> By the way, from the system call perspective, what it's doing is
> mallocing again and again and again...
> 
> I presume it's in this function (from the top level)
>    rc = update_attr(the_cib, cib_opts, type, dest_node, set_name,
>                attr_id, attr_name, attr_value);
> 
> 
> And I further presume (with somewhat more risk) that it's in this
> function from the next level down:
> 
>         rc = the_cib->cmds->modify(the_cib, section, xml_top, NULL,
>                                    call_options|cib_quorum_override);
> 
>       cib_client_modify(CIB_OP_MODIFY...)
> 
>       cib_native_perform_op()
> 
> Which sends the request over to the CIB, where it should do this...
> 
>       cib_process_modify()
> 
>       update_xml_child(obj_root, input)
> 
> However, from cib_process_modify on, all the work takes place in the
> CIB, not in the crm_master command.  So, I presume that it doesn't get
> that far.  [Other theories are also possible, of course ;-)]
> 
> Here is my initial conclusion:
>       1)  No one else has reported this problem
>       2)  The code in question is common and is used for many things
>       3)  Therefore it's more likely that something is amiss with your
>               CIB and causing the CIB code to loop looking for the
>               subtree to modify.  If this theory is correct, there are
>               two problems one with your CIB, and one in the code.
> 
> So, could you please send the current output from cibadmin -Q to the
> list as an attachment?
> 
I've attached the output from the "cibadmin -Q" command. 

> Could you also please run crm_verify on your CIB and see if it complains
> about anything.  If it does, please fix its complaints, and try again.
> 
"crm_verify -L" did not complain on any issues. However, "crm_verify
-x /var/lib/heartbeat/crm/cib.xml" had the following to say:

[dknight]# crm_verify -V -x /var/lib/heartbeat/crm/cib.xml
element cib: validity error : Element cib content does not follow the
DTD, expecting (configuration , status), got (configuration )

crm_verify[27448]: 2007/03/28_11:11:31 ERROR: validate_with_dtd: CIB
does not validate against /usr/lib64/heartbeat/crm.dtd

crm_verify[27448]: 2007/03/28_11:11:31 ERROR: main: CIB did not pass DTD
validation
Errors found during check: config not valid

> And, could you also please tell us how you installed the system.  If you
> didn't install a package, then did you make the required user ID and
> group ID?
> 
I pulled down the 2.0.8 tarball from the linux-ha web site. Used
ConfigureMe to build, with some minor changes (for my Red Hat distro of
EL5 Beta, I added DFLAGS="--with-group-id=60 --with-ccmuser-id=17" in
the appropriate place to get it to build). I created the hacluster user
ID and group ID to match
(hacluster:x:17:60::/var/lib/heartbeat/cores/hacluster:/bin/bash).

> 
>       Thanks!
> 
> 
 <cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2" 
cib_feature_revision="1.3" generated="true" ccm_transition="4" 
dc_uuid="2ba293d2-2c30-4957-ad8d-59ad15bb7e26" epoch="31" num_updates="1709">
   <configuration>
     <crm_config>
       <cluster_property_set id="cib-bootstrap-options">
         <attributes>
           <nvpair id="cib-bootstrap-options-symmetric-cluster" 
name="symmetric-cluster" value="true"/>
           <nvpair id="cib-bootstrap-options-no_quorum-policy" 
name="no_quorum-policy" value="stop"/>
           <nvpair id="cib-bootstrap-options-default-resource-stickiness" 
name="default-resource-stickiness" value="0"/>
           <nvpair 
id="cib-bootstrap-options-default-resource-failure-stickiness" 
name="default-resource-failure-stickiness" value="0"/>
           <nvpair id="cib-bootstrap-options-stonith-enabled" 
name="stonith-enabled" value="false"/>
           <nvpair id="cib-bootstrap-options-stonith-action" 
name="stonith-action" value="reboot"/>
           <nvpair id="cib-bootstrap-options-stop-orphan-resources" 
name="stop-orphan-resources" value="true"/>
           <nvpair id="cib-bootstrap-options-stop-orphan-actions" 
name="stop-orphan-actions" value="true"/>
           <nvpair id="cib-bootstrap-options-remove-after-stop" 
name="remove-after-stop" value="false"/>
           <nvpair id="cib-bootstrap-options-short-resource-names" 
name="short-resource-names" value="true"/>
           <nvpair id="cib-bootstrap-options-transition-idle-timeout" 
name="transition-idle-timeout" value="5min"/>
           <nvpair id="cib-bootstrap-options-default-action-timeout" 
name="default-action-timeout" value="5s"/>
           <nvpair id="cib-bootstrap-options-is-managed-default" 
name="is-managed-default" value="true"/>
           <nvpair name="last-lrm-refresh" 
id="cib-bootstrap-options-last-lrm-refresh" value="1174666357"/>
         </attributes>
       </cluster_property_set>
     </crm_config>
     <nodes>
       <node uname="arc-tkincaidlx.wsicorp.com" type="normal" 
id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
         <instance_attributes id="master-2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
           <attributes/>
         </instance_attributes>
       </node>
       <node uname="arc-dknightlx" type="normal" 
id="8c16c69e-f753-49cf-ba89-3ae421940042">
         <instance_attributes id="master-8c16c69e-f753-49cf-ba89-3ae421940042">
           <attributes/>
         </instance_attributes>
       </node>
     </nodes>
     <resources>
       <primitive class="ocf" id="IPaddr_147_81_84_133" provider="heartbeat" 
type="IPaddr">
         <operations>
           <op id="IPaddr_147_81_84_133_mon" interval="5s" name="monitor" 
timeout="5s"/>
         </operations>
         <instance_attributes id="IPaddr_147_81_84_133_inst_attr">
           <attributes>
             <nvpair id="IPaddr_147_81_84_133_attr_0" name="ip" 
value="147.81.84.133"/>
           </attributes>
         </instance_attributes>
       </primitive>
       <primitive class="ocf" type="pgsql" provider="heartbeat" id="pgsql_5555">
         <instance_attributes id="pgsql_5555_instance_attrs">
           <attributes>
             <nvpair id="f140c2b7-7ecb-4b20-9776-a146b6641859" name="pgctl" 
value="/usr/local/pgsql/bin/pg_ctl"/>
             <nvpair id="91b1e157-8bf9-4147-ba24-0e40148ac292" name="start_opt" 
value="-p 5555"/>
             <nvpair id="666a790a-d4ee-4f8e-b306-6a1c28e949e3" name="psql" 
value="/usr/local/pgsql/bin/psql"/>
             <nvpair id="712b74c5-2d84-4c02-ac78-f41a43f212a1" name="pgdata" 
value="/usr/local/pgsql/data_hb"/>
             <nvpair id="20cc85f2-3792-40f0-968f-81dc10c165d9" name="pgdba" 
value="postgres"/>
             <nvpair name="target_role" id="pgsql_5555_target_role" 
value="started"/>
             <nvpair id="96a6e2f0-31b7-4e17-983c-1ae1a31158e6" name="logfile" 
value="/var/log/pg.log"/>
             <nvpair id="e3ae93db-f195-4b12-968e-ff66212142d8" name="pgport" 
value="5555"/>
           </attributes>
         </instance_attributes>
         <operations>
           <op id="cce8767f-f9a4-4cc4-9a81-365eed513275" name="monitor" 
interval="30" timeout="30" start_delay="10" disabled="false" role="Started"/>
           <op id="e3740989-2763-4009-8399-2437923b0043" name="start" 
timeout="120" start_delay="0" disabled="false" role="Started"/>
           <op id="58e51f97-156c-4cf6-aba8-1faf61225e16" name="stop" 
timeout="120" start_delay="0" disabled="false" role="Started"/>
           <op id="36ea9ab4-401e-45c6-8097-8fe7b8a9bfe9" name="status" 
timeout="60" start_delay="0" disabled="false" role="Started"/>
         </operations>
       </primitive>
       <master_slave id="ms_pgsql_wal_5556">
         <instance_attributes id="ms_pgsql_wal_5556_instance_attrs">
           <attributes>
             <nvpair id="ms_pgsql_wal_5556_clone_max" name="clone_max" 
value="1"/>
             <nvpair id="ms_pgsql_wal_5556_clone_node_max" 
name="clone_node_max" value="1"/>
             <nvpair id="ms_pgsql_wal_5556_master_max" name="master_max" 
value="1"/>
             <nvpair id="ms_pgsql_wal_5556_master_node_max" 
name="master_node_max" value="1"/>
             <nvpair id="ms_pgsql_wal_5556_target_role" name="target_role" 
value="started"/>
           </attributes>
         </instance_attributes>
         <primitive class="ocf" type="stateful_pgsql" provider="wsi" 
id="pgsql_wal_5556">
           <instance_attributes id="pgsql_wal_5556_instance_attrs">
             <attributes>
               <nvpair id="9f2385d3-24f8-439b-90e8-df1f3082f3b1" name="pgctl" 
value="/usr/local/pgsql/bin/pg_ctl"/>
               <nvpair id="3d50c5f2-86af-4901-8728-f6e5624aa9f6" 
name="start_opt" value="-p 5556"/>
               <nvpair id="f4261c4d-e05e-4dc1-909c-6103f2480d9d" name="psql" 
value="/usr/local/pgsql/bin/psql"/>
               <nvpair id="2f0c2ecc-f678-46f4-9b83-f5ea4775a8de" name="pgdata" 
value="/usr/local/pgsql/data_hb_wal"/>
               <nvpair id="017b40ca-f90c-4553-a356-5baacd5c3ccf" name="pgdba" 
value="postgres"/>
               <nvpair id="f4ec04a7-e60b-4e40-ac8f-e7641ae31373" name="pgport" 
value="5556"/>
               <nvpair id="c44e85e6-76b9-408c-8c8f-76a459bd813e" name="pgdb" 
value="template1"/>
               <nvpair id="6ea6a5e6-92d0-4803-8a12-efb6ebd7921c" name="logfile" 
value="/var/log/pg_wal.log"/>
               <nvpair id="b736bea0-9377-4eb2-af53-c1f0c15490c2" 
name="trigger_file" value="/tmp/postgres.template1.5556"/>
               <nvpair name="target_role" id="pgsql_wal_5556:0_target_role" 
value="stopped"/>
             </attributes>
           </instance_attributes>
           <operations>
             <op id="09448898-c8da-4d2c-97f2-6f6aa1fd7092" name="start" 
timeout="120"/>
             <op id="34392461-8f23-4d14-b998-71b1c799f1b8" name="stop" 
timeout="120"/>
             <op id="90682d5b-4439-44f7-8568-1069a21a2b11" name="promote" 
timeout="60"/>
             <op id="4366022e-f7fc-4bac-a2bb-4b2422cdf5a4" name="demote" 
timeout="60"/>
             <op id="99030a86-9293-48ce-a9d9-746624683994" name="monitor" 
interval="30" timeout="30" start_delay="0"/>
           </operations>
         </primitive>
       </master_slave>
     </resources>
     <constraints>
       <rsc_colocation id="colocation_pgsql_5555" from="pgsql_5555" 
to="IPaddr_147_81_84_133" score="INFINITY"/>
       <rsc_location id="rsc_location_pgsql_wal_5556" rsc="ms_pgsql_wal_5556">
         <rule id="prefered_rsc_location_pgsql_wal_5556" score="100">
           <expression attribute="#uname" 
id="ef0fac9e-f3da-470b-9aa4-fd11fc694a03" operation="eq" value="arc-dknightlx"/>
         </rule>
       </rsc_location>
       <rsc_location id="rsc_location_IPaddr_147_81_84_133" 
rsc="IPaddr_147_81_84_133">
         <rule id="prefered_rsc_location_IPaddr_147_81_84_133" score="100">
           <expression attribute="#uname" 
id="prefered_location_IPaddr_147_81_84_133_expr" operation="eq" 
value="arc-dknightlx"/>
         </rule>
       </rsc_location>
     </constraints>
   </configuration>
   <status>
     <node_state uname="arc-dknightlx" crmd="online" in_ccm="true" ha="active" 
join="member" id="8c16c69e-f753-49cf-ba89-3ae421940042" shutdown="0" 
expected="member" crm-debug-origin="do_update_resource">
       <lrm id="8c16c69e-f753-49cf-ba89-3ae421940042">
         <lrm_resources>
           <lrm_resource id="IPaddr_147_81_84_133" type="IPaddr" class="ocf" 
provider="heartbeat">
             <lrm_rsc_op id="IPaddr_147_81_84_133_monitor_0" 
operation="monitor" crm-debug-origin="do_update_resource" 
transition_key="6:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:7;6:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="2" 
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
             <lrm_rsc_op id="IPaddr_147_81_84_133_start_0" operation="start" 
crm-debug-origin="do_update_resource" 
transition_key="11:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;11:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="5" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
             <lrm_rsc_op id="IPaddr_147_81_84_133_monitor_5000" 
operation="monitor" crm-debug-origin="do_update_resource" 
transition_key="7:3:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;7:3:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="7" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="5000" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
           </lrm_resource>
           <lrm_resource id="pgsql_5555" type="pgsql" class="ocf" 
provider="heartbeat">
             <lrm_rsc_op id="pgsql_5555_monitor_0" operation="monitor" 
crm-debug-origin="do_update_resource" 
transition_key="7:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:7;7:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="3" 
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
             <lrm_rsc_op id="pgsql_5555_start_0" operation="start" 
crm-debug-origin="do_update_resource" 
transition_key="14:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;14:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="6" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
             <lrm_rsc_op id="pgsql_5555_monitor_30000" operation="monitor" 
crm-debug-origin="do_update_resource" 
transition_key="10:3:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;10:3:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="8" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="30000" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
           </lrm_resource>
           <lrm_resource id="pgsql_wal_5556:0" type="stateful_pgsql" 
class="ocf" provider="wsi">
             <lrm_rsc_op id="pgsql_wal_5556:0_monitor_0" operation="monitor" 
crm-debug-origin="do_update_resource" 
transition_key="8:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:7;8:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="4" 
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0" 
op_digest="bd37d55acc806bd4fd681921029a7520"/>
           </lrm_resource>
         </lrm_resources>
       </lrm>
       <transient_attributes id="8c16c69e-f753-49cf-ba89-3ae421940042">
         <instance_attributes id="status-8c16c69e-f753-49cf-ba89-3ae421940042">
           <attributes>
             <nvpair 
id="status-8c16c69e-f753-49cf-ba89-3ae421940042-probe_complete" 
name="probe_complete" value="true"/>
           </attributes>
         </instance_attributes>
       </transient_attributes>
     </node_state>
     <node_state uname="arc-tkincaidlx.wsicorp.com" ha="active" crmd="online" 
shutdown="0" in_ccm="true" join="member" expected="member" 
id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26" crm-debug-origin="do_update_resource">
       <transient_attributes id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
         <instance_attributes id="status-2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
           <attributes>
             <nvpair 
id="status-2ba293d2-2c30-4957-ad8d-59ad15bb7e26-probe_complete" 
name="probe_complete" value="true"/>
           </attributes>
         </instance_attributes>
       </transient_attributes>
       <lrm id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
         <lrm_resources>
           <lrm_resource id="IPaddr_147_81_84_133" type="IPaddr" class="ocf" 
provider="heartbeat">
             <lrm_rsc_op id="IPaddr_147_81_84_133_monitor_0" 
operation="monitor" crm-debug-origin="build_active_RAs" 
transition_key="5:2:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="4:7;5:2:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="2" 
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
             <lrm_rsc_op id="IPaddr_147_81_84_133_start_0" operation="start" 
crm-debug-origin="build_active_RAs" 
transition_key="10:27:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="0:0;10:27:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="18" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
             <lrm_rsc_op id="IPaddr_147_81_84_133_monitor_5000" 
operation="monitor" crm-debug-origin="build_active_RAs" 
transition_key="6:0:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;6:0:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="20" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="5000" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
             <lrm_rsc_op id="IPaddr_147_81_84_133_stop_0" operation="stop" 
crm-debug-origin="do_update_resource" 
transition_key="10:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;10:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="23" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
           </lrm_resource>
           <lrm_resource id="pgsql_5555" type="pgsql" class="ocf" 
provider="heartbeat">
             <lrm_rsc_op id="pgsql_5555_monitor_0" operation="monitor" 
crm-debug-origin="build_active_RAs" 
transition_key="6:2:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="4:7;6:2:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="3" 
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
             <lrm_rsc_op id="pgsql_5555_start_0" operation="start" 
crm-debug-origin="build_active_RAs" 
transition_key="13:27:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="0:0;13:27:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="19" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
             <lrm_rsc_op id="pgsql_5555_monitor_30000" operation="monitor" 
crm-debug-origin="build_active_RAs" 
transition_key="9:0:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;9:0:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="21" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="30000" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
             <lrm_rsc_op id="pgsql_5555_stop_0" operation="stop" 
crm-debug-origin="do_update_resource" 
transition_key="13:2:a6be1dbc-ab05-4aba-8865-4dd033740982" 
transition_magic="0:0;13:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="25" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
           </lrm_resource>
           <lrm_resource id="pgsql_wal_5556:0" type="stateful_pgsql" 
class="ocf" provider="wsi">
             <lrm_rsc_op id="pgsql_wal_5556:0_monitor_0" operation="monitor" 
crm-debug-origin="build_active_RAs" 
transition_key="5:14:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="4:7;5:14:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="11" 
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0" 
op_digest="bd37d55acc806bd4fd681921029a7520"/>
             <lrm_rsc_op id="pgsql_wal_5556:0_start_0" operation="start" 
crm-debug-origin="build_active_RAs" 
transition_key="11:21:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="0:0;11:21:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="12" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="9de77b451cb1761a72d3b70817f8a634"/>
             <lrm_rsc_op id="pgsql_wal_5556:0_stop_0" operation="stop" 
crm-debug-origin="build_active_RAs" 
transition_key="4:22:141ae0ad-0626-4755-b17f-20b4f84606d9" 
transition_magic="0:0;4:22:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="17" 
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0" 
op_digest="bd37d55acc806bd4fd681921029a7520"/>
           </lrm_resource>
         </lrm_resources>
       </lrm>
     </node_state>
   </status>
 </cib>

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to