Hi Alan,
I tend to agree with your analysis below. I placed some debug output in
the crm_attribute.c module, and found that it went into its loop within
the call to update_attr. I had trouble trying to get debug output I
placed in the function itself (is cib_attrs.c the correct module?). Then
on Friday I ran out of time to continue persuing it.
On Sun, 2007-03-25 at 16:45 -0600, Alan Robertson wrote:
> Doug Knight wrote:
> > Got it. The attached file contains the strace from the second attempt by
> > heartbeat to start the resource up as master, right up until it was
> > killed. The resource already showed failed on the gui. I zipped it up
> > using gzip.
>
> By the way, from the system call perspective, what it's doing is
> mallocing again and again and again...
>
> I presume it's in this function (from the top level)
> rc = update_attr(the_cib, cib_opts, type, dest_node, set_name,
> attr_id, attr_name, attr_value);
>
>
> And I further presume (with somewhat more risk) that it's in this
> function from the next level down:
>
> rc = the_cib->cmds->modify(the_cib, section, xml_top, NULL,
> call_options|cib_quorum_override);
>
> cib_client_modify(CIB_OP_MODIFY...)
>
> cib_native_perform_op()
>
> Which sends the request over to the CIB, where it should do this...
>
> cib_process_modify()
>
> update_xml_child(obj_root, input)
>
> However, from cib_process_modify on, all the work takes place in the
> CIB, not in the crm_master command. So, I presume that it doesn't get
> that far. [Other theories are also possible, of course ;-)]
>
> Here is my initial conclusion:
> 1) No one else has reported this problem
> 2) The code in question is common and is used for many things
> 3) Therefore it's more likely that something is amiss with your
> CIB and causing the CIB code to loop looking for the
> subtree to modify. If this theory is correct, there are
> two problems one with your CIB, and one in the code.
>
> So, could you please send the current output from cibadmin -Q to the
> list as an attachment?
>
I've attached the output from the "cibadmin -Q" command.
> Could you also please run crm_verify on your CIB and see if it complains
> about anything. If it does, please fix its complaints, and try again.
>
"crm_verify -L" did not complain on any issues. However, "crm_verify
-x /var/lib/heartbeat/crm/cib.xml" had the following to say:
[dknight]# crm_verify -V -x /var/lib/heartbeat/crm/cib.xml
element cib: validity error : Element cib content does not follow the
DTD, expecting (configuration , status), got (configuration )
crm_verify[27448]: 2007/03/28_11:11:31 ERROR: validate_with_dtd: CIB
does not validate against /usr/lib64/heartbeat/crm.dtd
crm_verify[27448]: 2007/03/28_11:11:31 ERROR: main: CIB did not pass DTD
validation
Errors found during check: config not valid
> And, could you also please tell us how you installed the system. If you
> didn't install a package, then did you make the required user ID and
> group ID?
>
I pulled down the 2.0.8 tarball from the linux-ha web site. Used
ConfigureMe to build, with some minor changes (for my Red Hat distro of
EL5 Beta, I added DFLAGS="--with-group-id=60 --with-ccmuser-id=17" in
the appropriate place to get it to build). I created the hacluster user
ID and group ID to match
(hacluster:x:17:60::/var/lib/heartbeat/cores/hacluster:/bin/bash).
>
> Thanks!
>
>
<cib admin_epoch="0" have_quorum="true" ignore_dtd="false" num_peers="2"
cib_feature_revision="1.3" generated="true" ccm_transition="4"
dc_uuid="2ba293d2-2c30-4957-ad8d-59ad15bb7e26" epoch="31" num_updates="1709">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>
<nvpair id="cib-bootstrap-options-no_quorum-policy"
name="no_quorum-policy" value="stop"/>
<nvpair id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="0"/>
<nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="0"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-stonith-action"
name="stonith-action" value="reboot"/>
<nvpair id="cib-bootstrap-options-stop-orphan-resources"
name="stop-orphan-resources" value="true"/>
<nvpair id="cib-bootstrap-options-stop-orphan-actions"
name="stop-orphan-actions" value="true"/>
<nvpair id="cib-bootstrap-options-remove-after-stop"
name="remove-after-stop" value="false"/>
<nvpair id="cib-bootstrap-options-short-resource-names"
name="short-resource-names" value="true"/>
<nvpair id="cib-bootstrap-options-transition-idle-timeout"
name="transition-idle-timeout" value="5min"/>
<nvpair id="cib-bootstrap-options-default-action-timeout"
name="default-action-timeout" value="5s"/>
<nvpair id="cib-bootstrap-options-is-managed-default"
name="is-managed-default" value="true"/>
<nvpair name="last-lrm-refresh"
id="cib-bootstrap-options-last-lrm-refresh" value="1174666357"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node uname="arc-tkincaidlx.wsicorp.com" type="normal"
id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
<instance_attributes id="master-2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
<attributes/>
</instance_attributes>
</node>
<node uname="arc-dknightlx" type="normal"
id="8c16c69e-f753-49cf-ba89-3ae421940042">
<instance_attributes id="master-8c16c69e-f753-49cf-ba89-3ae421940042">
<attributes/>
</instance_attributes>
</node>
</nodes>
<resources>
<primitive class="ocf" id="IPaddr_147_81_84_133" provider="heartbeat"
type="IPaddr">
<operations>
<op id="IPaddr_147_81_84_133_mon" interval="5s" name="monitor"
timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_147_81_84_133_inst_attr">
<attributes>
<nvpair id="IPaddr_147_81_84_133_attr_0" name="ip"
value="147.81.84.133"/>
</attributes>
</instance_attributes>
</primitive>
<primitive class="ocf" type="pgsql" provider="heartbeat" id="pgsql_5555">
<instance_attributes id="pgsql_5555_instance_attrs">
<attributes>
<nvpair id="f140c2b7-7ecb-4b20-9776-a146b6641859" name="pgctl"
value="/usr/local/pgsql/bin/pg_ctl"/>
<nvpair id="91b1e157-8bf9-4147-ba24-0e40148ac292" name="start_opt"
value="-p 5555"/>
<nvpair id="666a790a-d4ee-4f8e-b306-6a1c28e949e3" name="psql"
value="/usr/local/pgsql/bin/psql"/>
<nvpair id="712b74c5-2d84-4c02-ac78-f41a43f212a1" name="pgdata"
value="/usr/local/pgsql/data_hb"/>
<nvpair id="20cc85f2-3792-40f0-968f-81dc10c165d9" name="pgdba"
value="postgres"/>
<nvpair name="target_role" id="pgsql_5555_target_role"
value="started"/>
<nvpair id="96a6e2f0-31b7-4e17-983c-1ae1a31158e6" name="logfile"
value="/var/log/pg.log"/>
<nvpair id="e3ae93db-f195-4b12-968e-ff66212142d8" name="pgport"
value="5555"/>
</attributes>
</instance_attributes>
<operations>
<op id="cce8767f-f9a4-4cc4-9a81-365eed513275" name="monitor"
interval="30" timeout="30" start_delay="10" disabled="false" role="Started"/>
<op id="e3740989-2763-4009-8399-2437923b0043" name="start"
timeout="120" start_delay="0" disabled="false" role="Started"/>
<op id="58e51f97-156c-4cf6-aba8-1faf61225e16" name="stop"
timeout="120" start_delay="0" disabled="false" role="Started"/>
<op id="36ea9ab4-401e-45c6-8097-8fe7b8a9bfe9" name="status"
timeout="60" start_delay="0" disabled="false" role="Started"/>
</operations>
</primitive>
<master_slave id="ms_pgsql_wal_5556">
<instance_attributes id="ms_pgsql_wal_5556_instance_attrs">
<attributes>
<nvpair id="ms_pgsql_wal_5556_clone_max" name="clone_max"
value="1"/>
<nvpair id="ms_pgsql_wal_5556_clone_node_max"
name="clone_node_max" value="1"/>
<nvpair id="ms_pgsql_wal_5556_master_max" name="master_max"
value="1"/>
<nvpair id="ms_pgsql_wal_5556_master_node_max"
name="master_node_max" value="1"/>
<nvpair id="ms_pgsql_wal_5556_target_role" name="target_role"
value="started"/>
</attributes>
</instance_attributes>
<primitive class="ocf" type="stateful_pgsql" provider="wsi"
id="pgsql_wal_5556">
<instance_attributes id="pgsql_wal_5556_instance_attrs">
<attributes>
<nvpair id="9f2385d3-24f8-439b-90e8-df1f3082f3b1" name="pgctl"
value="/usr/local/pgsql/bin/pg_ctl"/>
<nvpair id="3d50c5f2-86af-4901-8728-f6e5624aa9f6"
name="start_opt" value="-p 5556"/>
<nvpair id="f4261c4d-e05e-4dc1-909c-6103f2480d9d" name="psql"
value="/usr/local/pgsql/bin/psql"/>
<nvpair id="2f0c2ecc-f678-46f4-9b83-f5ea4775a8de" name="pgdata"
value="/usr/local/pgsql/data_hb_wal"/>
<nvpair id="017b40ca-f90c-4553-a356-5baacd5c3ccf" name="pgdba"
value="postgres"/>
<nvpair id="f4ec04a7-e60b-4e40-ac8f-e7641ae31373" name="pgport"
value="5556"/>
<nvpair id="c44e85e6-76b9-408c-8c8f-76a459bd813e" name="pgdb"
value="template1"/>
<nvpair id="6ea6a5e6-92d0-4803-8a12-efb6ebd7921c" name="logfile"
value="/var/log/pg_wal.log"/>
<nvpair id="b736bea0-9377-4eb2-af53-c1f0c15490c2"
name="trigger_file" value="/tmp/postgres.template1.5556"/>
<nvpair name="target_role" id="pgsql_wal_5556:0_target_role"
value="stopped"/>
</attributes>
</instance_attributes>
<operations>
<op id="09448898-c8da-4d2c-97f2-6f6aa1fd7092" name="start"
timeout="120"/>
<op id="34392461-8f23-4d14-b998-71b1c799f1b8" name="stop"
timeout="120"/>
<op id="90682d5b-4439-44f7-8568-1069a21a2b11" name="promote"
timeout="60"/>
<op id="4366022e-f7fc-4bac-a2bb-4b2422cdf5a4" name="demote"
timeout="60"/>
<op id="99030a86-9293-48ce-a9d9-746624683994" name="monitor"
interval="30" timeout="30" start_delay="0"/>
</operations>
</primitive>
</master_slave>
</resources>
<constraints>
<rsc_colocation id="colocation_pgsql_5555" from="pgsql_5555"
to="IPaddr_147_81_84_133" score="INFINITY"/>
<rsc_location id="rsc_location_pgsql_wal_5556" rsc="ms_pgsql_wal_5556">
<rule id="prefered_rsc_location_pgsql_wal_5556" score="100">
<expression attribute="#uname"
id="ef0fac9e-f3da-470b-9aa4-fd11fc694a03" operation="eq" value="arc-dknightlx"/>
</rule>
</rsc_location>
<rsc_location id="rsc_location_IPaddr_147_81_84_133"
rsc="IPaddr_147_81_84_133">
<rule id="prefered_rsc_location_IPaddr_147_81_84_133" score="100">
<expression attribute="#uname"
id="prefered_location_IPaddr_147_81_84_133_expr" operation="eq"
value="arc-dknightlx"/>
</rule>
</rsc_location>
</constraints>
</configuration>
<status>
<node_state uname="arc-dknightlx" crmd="online" in_ccm="true" ha="active"
join="member" id="8c16c69e-f753-49cf-ba89-3ae421940042" shutdown="0"
expected="member" crm-debug-origin="do_update_resource">
<lrm id="8c16c69e-f753-49cf-ba89-3ae421940042">
<lrm_resources>
<lrm_resource id="IPaddr_147_81_84_133" type="IPaddr" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="IPaddr_147_81_84_133_monitor_0"
operation="monitor" crm-debug-origin="do_update_resource"
transition_key="6:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:7;6:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="2"
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
<lrm_rsc_op id="IPaddr_147_81_84_133_start_0" operation="start"
crm-debug-origin="do_update_resource"
transition_key="11:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;11:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="5"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
<lrm_rsc_op id="IPaddr_147_81_84_133_monitor_5000"
operation="monitor" crm-debug-origin="do_update_resource"
transition_key="7:3:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;7:3:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="7"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="5000"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
</lrm_resource>
<lrm_resource id="pgsql_5555" type="pgsql" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pgsql_5555_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource"
transition_key="7:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:7;7:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="3"
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
<lrm_rsc_op id="pgsql_5555_start_0" operation="start"
crm-debug-origin="do_update_resource"
transition_key="14:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;14:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="6"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
<lrm_rsc_op id="pgsql_5555_monitor_30000" operation="monitor"
crm-debug-origin="do_update_resource"
transition_key="10:3:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;10:3:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="8"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="30000"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
</lrm_resource>
<lrm_resource id="pgsql_wal_5556:0" type="stateful_pgsql"
class="ocf" provider="wsi">
<lrm_rsc_op id="pgsql_wal_5556:0_monitor_0" operation="monitor"
crm-debug-origin="do_update_resource"
transition_key="8:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:7;8:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="4"
crm_feature_set="1.0.7" rc_code="7" op_status="0" interval="0"
op_digest="bd37d55acc806bd4fd681921029a7520"/>
</lrm_resource>
</lrm_resources>
</lrm>
<transient_attributes id="8c16c69e-f753-49cf-ba89-3ae421940042">
<instance_attributes id="status-8c16c69e-f753-49cf-ba89-3ae421940042">
<attributes>
<nvpair
id="status-8c16c69e-f753-49cf-ba89-3ae421940042-probe_complete"
name="probe_complete" value="true"/>
</attributes>
</instance_attributes>
</transient_attributes>
</node_state>
<node_state uname="arc-tkincaidlx.wsicorp.com" ha="active" crmd="online"
shutdown="0" in_ccm="true" join="member" expected="member"
id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26" crm-debug-origin="do_update_resource">
<transient_attributes id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
<instance_attributes id="status-2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
<attributes>
<nvpair
id="status-2ba293d2-2c30-4957-ad8d-59ad15bb7e26-probe_complete"
name="probe_complete" value="true"/>
</attributes>
</instance_attributes>
</transient_attributes>
<lrm id="2ba293d2-2c30-4957-ad8d-59ad15bb7e26">
<lrm_resources>
<lrm_resource id="IPaddr_147_81_84_133" type="IPaddr" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="IPaddr_147_81_84_133_monitor_0"
operation="monitor" crm-debug-origin="build_active_RAs"
transition_key="5:2:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="4:7;5:2:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="2"
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
<lrm_rsc_op id="IPaddr_147_81_84_133_start_0" operation="start"
crm-debug-origin="build_active_RAs"
transition_key="10:27:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="0:0;10:27:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="18"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
<lrm_rsc_op id="IPaddr_147_81_84_133_monitor_5000"
operation="monitor" crm-debug-origin="build_active_RAs"
transition_key="6:0:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;6:0:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="20"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="5000"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
<lrm_rsc_op id="IPaddr_147_81_84_133_stop_0" operation="stop"
crm-debug-origin="do_update_resource"
transition_key="10:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;10:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="23"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="26517a2a9fde8bc02319582a3ac78d34"/>
</lrm_resource>
<lrm_resource id="pgsql_5555" type="pgsql" class="ocf"
provider="heartbeat">
<lrm_rsc_op id="pgsql_5555_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs"
transition_key="6:2:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="4:7;6:2:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="3"
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
<lrm_rsc_op id="pgsql_5555_start_0" operation="start"
crm-debug-origin="build_active_RAs"
transition_key="13:27:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="0:0;13:27:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="19"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
<lrm_rsc_op id="pgsql_5555_monitor_30000" operation="monitor"
crm-debug-origin="build_active_RAs"
transition_key="9:0:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;9:0:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="21"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="30000"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
<lrm_rsc_op id="pgsql_5555_stop_0" operation="stop"
crm-debug-origin="do_update_resource"
transition_key="13:2:a6be1dbc-ab05-4aba-8865-4dd033740982"
transition_magic="0:0;13:2:a6be1dbc-ab05-4aba-8865-4dd033740982" call_id="25"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="eaf9677755ccc70f170079c7447b1aee"/>
</lrm_resource>
<lrm_resource id="pgsql_wal_5556:0" type="stateful_pgsql"
class="ocf" provider="wsi">
<lrm_rsc_op id="pgsql_wal_5556:0_monitor_0" operation="monitor"
crm-debug-origin="build_active_RAs"
transition_key="5:14:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="4:7;5:14:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="11"
crm_feature_set="1.0.7" rc_code="7" op_status="4" interval="0"
op_digest="bd37d55acc806bd4fd681921029a7520"/>
<lrm_rsc_op id="pgsql_wal_5556:0_start_0" operation="start"
crm-debug-origin="build_active_RAs"
transition_key="11:21:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="0:0;11:21:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="12"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="9de77b451cb1761a72d3b70817f8a634"/>
<lrm_rsc_op id="pgsql_wal_5556:0_stop_0" operation="stop"
crm-debug-origin="build_active_RAs"
transition_key="4:22:141ae0ad-0626-4755-b17f-20b4f84606d9"
transition_magic="0:0;4:22:141ae0ad-0626-4755-b17f-20b4f84606d9" call_id="17"
crm_feature_set="1.0.7" rc_code="0" op_status="0" interval="0"
op_digest="bd37d55acc806bd4fd681921029a7520"/>
</lrm_resource>
</lrm_resources>
</lrm>
</node_state>
</status>
</cib>
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/