i hate to pester, but where are the "fail counts" kept track of and what maintains them?
they are stored in the status section and are maintained by the tengine process (which increases it whenever a monitor action fails) there is also a CLI tool called crm_failcount that can be used to view and modify the failcount. there was a bug with the failcount not always being increased, this was fixed in 2.0.8
FYI: the resource groups run fine without the pingd additions. furthermore, here is a snip of what i think does not look right. can anyone decipher? ...snip from messages Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <rsc_op id="11" operation="monitor" operation_key="pingd-child:0_monitor_0" on_node="roxetta" on_node_uuid="5d54293b-b319-4145-a280-a7d7e2dfd33a" transition_key="11:0:65528d84-33ba-4a40-a37c-edec19b7cd0c"> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <primitive id="pingd-child:0" long-id="pingd:pingd-child:0" class="OCF" provider="heartbeat" type="pingd"/> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <attributes crm_feature_set="1.0.7" CRM_meta_timeout="5000" CRM_meta_op_target_rc="7" CRM_meta_clone="0" CRM_meta_clone_max="2" CRM_meta_clone_node_max="1"/> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command </rsc_op> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <rsc_op id="12" operation="monitor" operation_key="pingd-child:1_monitor_0" on_node="roxetta" on_node_uuid="5d54293b-b319-4145-a280-a7d7e2dfd33a" transition_key="12:0:65528d84-33ba-4a40-a37c-edec19b7cd0c"> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <primitive id="pingd-child:1" long-id="pingd:pingd-child:1" class="OCF" provider="heartbeat" type="pingd"/> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command <attributes crm_feature_set="1.0.7" CRM_meta_timeout="5000" CRM_meta_op_target_rc="7" CRM_meta_clone="1" CRM_meta_clone_max="2" CRM_meta_clone_node_max="1"/> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad command </rsc_op> Apr 7 15:05:19 roxetta crmd: [8723]: WARN: do_log: [[FSA]] Input I_FAIL from get_lrm_resource() received in state (S_TRANSITION_ENGINE) Apr 7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: roxetta: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_FAIL cause=C_FSA_INTERNAL origin=get_lrm_resource ] Apr 7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: All 1 cluster nodes are eligable to run resources. Apr 7 15:05:19 roxetta crmd: [8723]: info: start_subsystem: Starting sub-system "tengine" Apr 7 15:05:19 roxetta crmd: [8723]: WARN: start_subsystem: Client tengine already running as pid 29971 Apr 7 15:05:19 roxetta crmd: [8723]: info: stop_subsystem: Sent -TERM to tengine: [29971] Apr 7 15:05:19 roxetta crmd: [8723]: WARN: do_log: [[FSA]] Input I_FAIL from get_lrm_resource() received in state (S_POLICY_ENGINE) Apr 7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: roxetta: State transition S_POLICY_ENGINE -> S_INTEGRATION [ input=I_FAIL cause=C_FSA_INTERNAL origin=get_lrm_resource ] Apr 7 15:05:19 roxetta crmd: [8723]: info: update_dc: Set DC to <null> (<null>) Apr 7 15:05:19 roxetta tengine: [29971]: info: update_abort_priority: Abort priority upgraded to 1000000 Apr 7 15:05:19 roxetta tengine: [29971]: info: update_abort_priority: Abort action 0 superceeded by 3 Apr 7 15:05:19 roxetta crmd: [8723]: info: do_dc_join_offer_all: join-2: Waiting on 1 outstanding join acks ...end snip _______________________________________________ Linux-HA mailing list [EMAIL PROTECTED] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
_______________________________________________ Linux-HA mailing list [EMAIL PROTECTED] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
