i hate to pester, but where are the "fail counts" kept track of and what 
maintains
them?

they are stored in the status section and are maintained by the
tengine process (which increases it whenever a monitor action fails)

there is also a CLI tool called crm_failcount that can be used to view
and modify the failcount.

there was a bug with the failcount not always being increased, this
was fixed in 2.0.8


FYI: the resource groups run fine without the pingd additions.


furthermore, here is a snip of what i think does not look right.  can anyone 
decipher?

...snip from messages

Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad 
command <rsc_op
id="11" operation="monitor" operation_key="pingd-child:0_monitor_0" 
on_node="roxetta"
on_node_uuid="5d54293b-b319-4145-a280-a7d7e2dfd33a"
transition_key="11:0:65528d84-33ba-4a40-a37c-edec19b7cd0c">
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: 
Bad command
<primitive id="pingd-child:0" long-id="pingd:pingd-child:0" class="OCF" 
provider="heartbeat"
type="pingd"/>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: 
Bad command
<attributes crm_feature_set="1.0.7" CRM_meta_timeout="5000" 
CRM_meta_op_target_rc="7"
CRM_meta_clone="0" CRM_meta_clone_max="2" CRM_meta_clone_node_max="1"/>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad 
command </rsc_op>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad 
command <rsc_op
id="12" operation="monitor" operation_key="pingd-child:1_monitor_0" 
on_node="roxetta"
on_node_uuid="5d54293b-b319-4145-a280-a7d7e2dfd33a"
transition_key="12:0:65528d84-33ba-4a40-a37c-edec19b7cd0c">
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: 
Bad command
<primitive id="pingd-child:1" long-id="pingd:pingd-child:1" class="OCF" 
provider="heartbeat"
type="pingd"/>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: 
Bad command
<attributes crm_feature_set="1.0.7" CRM_meta_timeout="5000" 
CRM_meta_op_target_rc="7"
CRM_meta_clone="1" CRM_meta_clone_max="2" CRM_meta_clone_node_max="1"/>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: log_data_element: do_lrm_invoke: Bad 
command </rsc_op>
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: do_log: [[FSA]] Input I_FAIL from 
get_lrm_resource()
received in state (S_TRANSITION_ENGINE)
Apr  7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: roxetta: State 
transition
S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_FAIL cause=C_FSA_INTERNAL 
origin=get_lrm_resource ]
Apr  7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: All 1 cluster 
nodes are eligable to
run resources.
Apr  7 15:05:19 roxetta crmd: [8723]: info: start_subsystem: Starting sub-system 
"tengine"
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: start_subsystem: Client tengine 
already running as pid 29971
Apr  7 15:05:19 roxetta crmd: [8723]: info: stop_subsystem: Sent -TERM to 
tengine: [29971]
Apr  7 15:05:19 roxetta crmd: [8723]: WARN: do_log: [[FSA]] Input I_FAIL from 
get_lrm_resource()
received in state (S_POLICY_ENGINE)
Apr  7 15:05:19 roxetta crmd: [8723]: info: do_state_transition: roxetta: State 
transition
S_POLICY_ENGINE -> S_INTEGRATION [ input=I_FAIL cause=C_FSA_INTERNAL 
origin=get_lrm_resource ]
Apr  7 15:05:19 roxetta crmd: [8723]: info: update_dc: Set DC to <null> (<null>)
Apr  7 15:05:19 roxetta tengine: [29971]: info: update_abort_priority: Abort 
priority upgraded to
1000000
Apr  7 15:05:19 roxetta tengine: [29971]: info: update_abort_priority: Abort 
action 0 superceeded by 3
Apr  7 15:05:19 roxetta crmd: [8723]: info: do_dc_join_offer_all: join-2: 
Waiting on 1 outstanding
join acks


...end snip


_______________________________________________
Linux-HA mailing list
[EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[EMAIL PROTECTED]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to