[Linux-HA] stderr reboot

Sam Reidland Wed, 26 May 2010 09:05:29 -0700

I have been working on a simple 2 node 2 resource cluster using
Pacemaker 1.0.7 and heartbeat 3.0.2. The two resources are IPaddr and
our application. When our application was started, the box would reboot
(actually a clean restart). After a lot of searching I found that if I
didn't initialize net-SNMP, everything started perfectly. The build of
net-SNMP we use spits 2 or 3 lines to stderr when it starts and I
noticed that the reboot occurred after the first line to stderr was
printed and no other output was seen after that. My OCF script started
our app with the following command '/BACKHAUL/bhApplication >/dev/null
&'. I changed the command to '/BACKHAUL/bhApplication &>/dev/null &' and
everything works as it should. So the question is, why does the HA
software cause the box to reboot when something is sent to stderr? I'm
not even sure what part caused the box to reboot.


I have included the log from a session in which the box rebooted.

Jan  1 00:10:51 bh130 syslog.info syslogd started: BusyBox v1.15.3
Jan  1 00:14:04 bh130 daemon.warn lrmd: [1145]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:04 bh130 daemon.info lrmd: [1145]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Jan  1 00:14:07 bh130 daemon.warn ccm: [1143]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:07 bh130 daemon.err ccm: [1143]: ERROR: Cannot chdir to
[/usr/var/lib/heartbeat/cores/hacluster]: No such file or directory
Jan  1 00:14:07 bh130 daemon.info ccm: [1143]: info: Hostname: bh130
Jan  1 00:14:15 bh130 daemon.warn stonithd: [1146]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:15 bh130 daemon.warn stonithd: [1146]: WARN: Core dumps
could be lost if multiple dumps occur.
Jan  1 00:14:15 bh130 daemon.warn stonithd: [1146]: WARN: Consider
setting non-default value in /proc/sys/kernel/core_pattern (or
equivalent) for maximum supportability
Jan  1 00:14:15 bh130 daemon.warn stonithd: [1146]: WARN: Consider
setting /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Jan  1 00:14:15 bh130 daemon.info stonithd: [1146]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Jan  1 00:14:15 bh130 daemon.info stonithd: [1146]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug: quorum plugin:
majority
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug: cluster:linux-ha,
member_count=1, member_quorum_votes=100
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug:
total_node_count=2, total_quorum_votes=200
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug: quorum plugin:
twonodes
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug: cluster:linux-ha,
member_count=1, member_quorum_votes=100
Jan  1 00:14:15 bh130 daemon.debug ccm: [1143]: debug:
total_node_count=2, total_quorum_votes=200
Jan  1 00:14:15 bh130 daemon.info ccm: [1143]: info: Break tie for 2
nodes cluster
Jan  1 00:14:15 bh130 daemon.info ccm: [1143]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Jan  1 00:14:16 bh130 daemon.info stonithd: [1146]: info:
crm_cluster_connect: Connecting to Heartbeat
Jan  1 00:14:16 bh130 daemon.info stonithd: [1146]: info:
register_heartbeat_conn: Hostname: bh130
Jan  1 00:14:16 bh130 daemon.info stonithd: [1146]: info:
register_heartbeat_conn: UUID: 4e5638b3-0baf-44f3-96bc-f8efd6c10595
Jan  1 00:14:16 bh130 daemon.notice stonithd: [1146]: notice:
/usr/lib/heartbeat/stonithd start up successfully.
Jan  1 00:14:16 bh130 daemon.info stonithd: [1146]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jan  1 00:14:18 bh130 daemon.info lrmd: [1145]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jan  1 00:14:18 bh130 daemon.info lrmd: [1145]: info: enabling coredumps
Jan  1 00:14:18 bh130 daemon.warn lrmd: [1145]: WARN: Core dumps could
be lost if multiple dumps occur.
Jan  1 00:14:18 bh130 daemon.warn lrmd: [1145]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Jan  1 00:14:18 bh130 daemon.warn lrmd: [1145]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Jan  1 00:14:18 bh130 daemon.info lrmd: [1145]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Jan  1 00:14:18 bh130 daemon.info lrmd: [1145]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Jan  1 00:14:18 bh130 daemon.info lrmd: [1145]: info: Started.
Jan  1 00:14:23 bh130 daemon.err crmd: [1148]: ERROR: crm_log_init:
Cannot change active directory to
/usr/var/lib/heartbeat/cores/hacluster: No such file or directory (2)
Jan  1 00:14:23 bh130 daemon.err cib: [1144]: ERROR: crm_log_init:
Cannot change active directory to
/usr/var/lib/heartbeat/cores/hacluster: No such file or directory (2)
Jan  1 00:14:23 bh130 daemon.warn cib: [1144]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:23 bh130 daemon.info cib: [1144]: info: Invoked:
/usr/lib/heartbeat/cib
Jan  1 00:14:23 bh130 daemon.info cib: [1144]: info:
G_main_add_TriggerHandler: Added signal manual handler
Jan  1 00:14:23 bh130 daemon.info cib: [1144]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jan  1 00:14:23 bh130 daemon.err attrd: [1147]: ERROR: crm_log_init:
Cannot change active directory to
/usr/var/lib/heartbeat/cores/hacluster: No such file or directory (2)
Jan  1 00:14:23 bh130 daemon.warn attrd: [1147]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:23 bh130 daemon.warn crmd: [1148]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:14:23 bh130 daemon.info cib: [1144]: info: retrieveCib:
Reading cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml
(digest: /usr/var/lib/heartbeat/crm/cib.xml.sig)
Jan  1 00:14:23 bh130 daemon.info crmd: [1148]: info: Invoked:
/usr/lib/heartbeat/crmd
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info: Invoked:
/usr/lib/heartbeat/attrd
Jan  1 00:14:23 bh130 daemon.info crmd: [1148]: info: main: CRM Hg
Version: 2eed906f43e90ee1e0f7d411f814fc585b30f869
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info: main: Starting up
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info:
crm_cluster_connect: Connecting to Heartbeat
Jan  1 00:14:23 bh130 daemon.info crmd: [1148]: info: crmd_init:
Starting crmd
Jan  1 00:14:23 bh130 daemon.info crmd: [1148]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info:
register_heartbeat_conn: Hostname: bh130
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info:
register_heartbeat_conn: UUID: 4e5638b3-0baf-44f3-96bc-f8efd6c10595
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info: main: Cluster
connection active
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info: main: Accepting
attribute updates
Jan  1 00:14:23 bh130 daemon.info attrd: [1147]: info: main: Starting
mainloop...
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: startCib: CIB
Initialization completed successfully
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
crm_cluster_connect: Connecting to Heartbeat
Jan  1 00:14:24 bh130 daemon.info crmd: [1148]: info: do_cib_control:
Could not connect to the CIB service: connection failed
Jan  1 00:14:24 bh130 daemon.warn crmd: [1148]: WARN: do_cib_control:
Couldn't complete CIB registration 1 times... pause and retry
Jan  1 00:14:24 bh130 daemon.info crmd: [1148]: info: crmd_init:
Starting crmd's mainloop
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
register_heartbeat_conn: Hostname: bh130
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
register_heartbeat_conn: UUID: 4e5638b3-0baf-44f3-96bc-f8efd6c10595
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: ccm_connect:
Registering with CCM...
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: cib_init:
Requesting the list of configured nodes
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: cib_init: Starting
cib mainloop
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
cib_client_status_callback: Status update: Client bh130/cib now has
status [join]
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: crm_new_peer: Node
0 is now known as bh130
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
crm_update_peer_proc: bh130.cib is now online
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: mem_handle_event:
instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
cib_ccm_msg_callback: Processing CCM event=NEW MEMBERSHIP (id=1)
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: crm_get_peer: Node
bh130 now has id: 1
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info: crm_update_peer:
Node bh130: id=1 state=member (new) addr=(null) votes=-1 born=1 seen=1
proc=00000000000000000000000000000100
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
crm_update_peer_proc: bh130.ais is now online
Jan  1 00:14:24 bh130 daemon.info cib: [1144]: info:
crm_update_peer_proc: bh130.crmd is now online
Jan  1 00:14:25 bh130 daemon.info cib: [1144]: info:
cib_client_status_callback: Status update: Client bh130/cib now has
status [online]
Jan  1 00:14:25 bh130 daemon.info cib: [1149]: info: write_cib_contents:
Archived previous version as /usr/var/lib/heartbeat/crm/cib-92.raw
Jan  1 00:14:25 bh130 daemon.info cib: [1149]: info: write_cib_contents:
Wrote version 0.130.0 of the CIB to disk (digest:
ff150923e18c8ba4dc1ab01963555751)
Jan  1 00:14:25 bh130 daemon.info cib: [1149]: info: retrieveCib:
Reading cluster configuration from:
/usr/var/lib/heartbeat/crm/cib.mSE2WR (digest:
/usr/var/lib/heartbeat/crm/cib.EL1cKz)
Jan  1 00:14:26 bh130 daemon.info crmd: [1148]: info: crm_timer_popped:
Wait Timer (I_NULL) just popped!
Jan  1 00:14:26 bh130 daemon.info crmd: [1148]: info: do_cib_control:
CIB connection established
Jan  1 00:14:26 bh130 daemon.info crmd: [1148]: info:
crm_cluster_connect: Connecting to Heartbeat
Jan  1 00:14:26 bh130 daemon.info crmd: [1148]: info:
register_heartbeat_conn: Hostname: bh130
Jan  1 00:14:26 bh130 daemon.info crmd: [1148]: info:
register_heartbeat_conn: UUID: 4e5638b3-0baf-44f3-96bc-f8efd6c10595
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info: do_ha_control:
Connected to the cluster
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info: do_ccm_control:
CCM connection established... waiting for first callback
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info: do_started:
Delaying start, CCM (0000000000100000) not connected
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info:
config_query_callback: Checking for expired actions every 900000ms
Jan  1 00:14:27 bh130 daemon.notice crmd: [1148]: notice:
crmd_client_status_callback: Status update: Client bh130/crmd now has
status [online] (DC=false)
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info: crm_new_peer: Node
0 is now known as bh130
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info:
crm_update_peer_proc: bh130.crmd is now online
Jan  1 00:14:27 bh130 daemon.info crmd: [1148]: info:
crmd_client_status_callback: Not the DC
Jan  1 00:14:27 bh130 daemon.notice crmd: [1148]: notice:
crmd_client_status_callback: Status update: Client bh130/crmd now has
status [online] (DC=false)
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info:
crmd_client_status_callback: Not the DC
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: mem_handle_event:
Got an event OC_EV_MS_NEW_MEMBERSHIP from ccm
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: mem_handle_event:
instance=1, nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info:
crmd_ccm_msg_callback: Quorum (re)attained after event=NEW MEMBERSHIP (id=1)
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: ccm_event_detail:
NEW MEMBERSHIP: trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0,
old_idx=3
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: ccm_event_detail:
    CURRENT: bh130 [nodeid=1, born=1]
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: ccm_event_detail:
    NEW:     bh130 [nodeid=1, born=1]
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: crm_get_peer: Node
bh130 now has id: 1
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: crm_update_peer:
Node bh130: id=1 state=member (new) addr=(null) votes=-1 born=1 seen=1
proc=00000000000000000000000000000200
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info:
crm_update_peer_proc: bh130.ais is now online
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info: do_started: The
local CRM is operational
Jan  1 00:14:28 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_STARTING -> S_PENDING [
input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ]
Jan  1 00:14:28 bh130 daemon.info attrd: [1147]: info: cib_connect:
Connected to the CIB after 1 signon attempts
Jan  1 00:14:28 bh130 daemon.info attrd: [1147]: info: cib_connect:
Sending full refresh
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info: crm_timer_popped:
Election Trigger (I_DC_TIMEOUT) just popped!
Jan  1 00:15:29 bh130 daemon.warn crmd: [1148]: WARN: do_log: FSA: Input
I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_PENDING -> S_ELECTION [
input=I_DC_TIMEOUT cause=C_TIMER_POPPED origin=crm_timer_popped ]
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_ELECTION -> S_INTEGRATION [
input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ]
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info: do_te_control:
Registering TE UUID: fb4c1819-0064-4651-a330-a2071dc4e495
Jan  1 00:15:29 bh130 daemon.warn crmd: [1148]: WARN:
cib_client_add_notify_callback: Callback already present
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info:
set_graph_functions: Setting custom graph functions
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info: unpack_graph:
Unpacked transition -1: 0 actions in 0 synapses
Jan  1 00:15:29 bh130 daemon.info crmd: [1148]: info: start_subsystem:
Starting sub-system "pengine"
Jan  1 00:15:31 bh130 daemon.err pengine: [1150]: ERROR: crm_log_init:
Cannot change active directory to
/usr/var/lib/heartbeat/cores/hacluster: No such file or directory (2)
Jan  1 00:15:31 bh130 daemon.warn pengine: [1150]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:15:31 bh130 daemon.info pengine: [1150]: info: Invoked:
/usr/lib/heartbeat/pengine
Jan  1 00:15:31 bh130 daemon.info pengine: [1150]: info: main: Starting
pengine
Jan  1 00:15:32 bh130 daemon.info crmd: [1148]: info: do_dc_takeover:
Taking over DC status for this partition
Jan  1 00:15:32 bh130 daemon.info cib: [1144]: info:
cib_process_readwrite: We are now in R/W mode
Jan  1 00:15:32 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_master for section 'all'
(origin=local/crmd/6, version=0.130.0): ok (rc=0)
Jan  1 00:15:32 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section cib
(origin=local/crmd/7, version=0.130.0): ok (rc=0)
Jan  1 00:15:32 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section
crm_config (origin=local/crmd/9, version=0.130.0): ok (rc=0)
Jan  1 00:15:32 bh130 daemon.info crmd: [1148]: info: join_make_offer:
Making join offers based on membership 1
Jan  1 00:15:32 bh130 daemon.info crmd: [1148]: info:
do_dc_join_offer_all: join-1: Waiting on 1 outstanding join acks
Jan  1 00:15:32 bh130 daemon.info crmd: [1148]: info:
te_connect_stonith: Attempting connection to fencing daemon...
Jan  1 00:15:32 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section
crm_config (origin=local/crmd/11, version=0.130.0): ok (rc=0)
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info:
te_connect_stonith: Connected
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info:
config_query_callback: Checking for expired actions every 900000ms
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info: update_dc: Set DC
to bh130 (3.0.1)
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [
input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ]
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info:
do_state_transition: All 1 cluster nodes responded to the join offer.
Jan  1 00:15:33 bh130 daemon.info crmd: [1148]: info:
do_dc_join_finalize: join-1: Syncing the CIB from bh130 to the rest of
the cluster
Jan  1 00:15:33 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_sync for section 'all'
(origin=local/crmd/14, version=0.130.0): ok (rc=0)
Jan  1 00:15:33 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section nodes
(origin=local/crmd/15, version=0.130.0): ok (rc=0)
Jan  1 00:15:34 bh130 daemon.info crmd: [1148]: info: update_attrd:
Connecting to attrd...
Jan  1 00:15:34 bh130 daemon.info attrd: [1147]: info: find_hash_entry:
Creating hash entry for terminate
Jan  1 00:15:34 bh130 daemon.info attrd: [1147]: info: find_hash_entry:
Creating hash entry for shutdown
Jan  1 00:15:34 bh130 daemon.info crmd: [1148]: info: do_dc_join_ack:
join-1: Updating node state to member for bh130
Jan  1 00:15:34 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_delete for section
//node_sta...@uname='bh130']/transient_attributes (origin=local/crmd/16,
version=0.130.0): ok (rc=0)
Jan  1 00:15:34 bh130 daemon.info crmd: [1148]: info:
erase_xpath_callback: Deletion of
"//node_sta...@uname='bh130']/transient_attributes": ok (rc=0)
Jan  1 00:15:34 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_delete for section
//node_sta...@uname='bh130']/lrm (origin=local/crmd/17,
version=0.130.0): ok (rc=0)
Jan  1 00:15:34 bh130 daemon.info crmd: [1148]: info:
erase_xpath_callback: Deletion of "//node_sta...@uname='bh130']/lrm": ok
(rc=0)
Jan  1 00:15:35 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_delete for section
//node_sta...@uname='bh130']/lrm (origin=local/crmd/18,
version=0.130.0): ok (rc=0)
Jan  1 00:15:35 bh130 daemon.info crmd: [1148]: info:
erase_xpath_callback: Deletion of "//node_sta...@uname='bh130']/lrm": ok
(rc=0)
Jan  1 00:15:35 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE
[ input=I_FINALIZED cause=C_FSA_INTERNAL origin=check_join_state ]
Jan  1 00:15:35 bh130 daemon.info crmd: [1148]: info:
populate_cib_nodes_ha: Requesting the list of configured nodes
Jan  1 00:15:36 bh130 daemon.warn crmd: [1148]: WARN: get_uuid: Could
not calculate UUID for bh106
Jan  1 00:15:36 bh130 daemon.warn crmd: [1148]: WARN:
populate_cib_nodes_ha: Node bh106: no uuid found
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info:
do_state_transition: All 1 cluster nodes are eligible to run resources.
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info: do_dc_join_final:
Ensuring DC, quorum and node attributes are up-to-date
Jan  1 00:15:36 bh130 daemon.info attrd: [1147]: info:
attrd_local_callback: Sending full refresh (origin=crmd)
Jan  1 00:15:36 bh130 daemon.info attrd: [1147]: info:
attrd_trigger_update: Sending flush op to all hosts for: shutdown (<null>)
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info: crm_update_quorum:
Updating quorum status to true (call=22)
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info:
abort_transition_graph: do_te_invoke:191 - Triggered transition abort
(complete=1) : Peer Cancelled
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info: do_pe_invoke:
Query 23: Requesting the current CIB: S_POLICY_ENGINE
Jan  1 00:15:36 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section nodes
(origin=local/crmd/20, version=0.130.1): ok (rc=0)
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info:
abort_transition_graph: need_abort:59 - Triggered transition abort
(complete=1) : Non-status change
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info: need_abort:
Aborting on change to admin_epoch
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info: do_pe_invoke:
Query 24: Requesting the current CIB: S_POLICY_ENGINE
Jan  1 00:15:36 bh130 daemon.info cib: [1144]: info: log_data_element:
cib:diff: - <cib admin_epoch="0" epoch="130" num_updates="1" />
Jan  1 00:15:36 bh130 daemon.info cib: [1144]: info: log_data_element:
cib:diff: + <cib dc-uuid="4e5638b3-0baf-44f3-96bc-f8efd6c10595"
admin_epoch="0" epoch="131" num_updates="1" />
Jan  1 00:15:36 bh130 daemon.info cib: [1144]: info:
cib_process_request: Operation complete: op cib_modify for section cib
(origin=local/crmd/22, version=0.131.1): ok (rc=0)
Jan  1 00:15:36 bh130 daemon.info crmd: [1148]: info:
do_pe_invoke_callback: Invoking the PE: query=24, ref=pe_calc-dc-936-7,
seq=1, quorate=1
Jan  1 00:15:36 bh130 daemon.notice pengine: [1150]: notice:
unpack_config: On loss of CCM Quorum: Ignore
Jan  1 00:15:36 bh130 daemon.info attrd: [1147]: info:
attrd_trigger_update: Sending flush op to all hosts for: terminate (<null>)
Jan  1 00:15:36 bh130 daemon.info pengine: [1150]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan  1 00:15:36 bh130 daemon.info pengine: [1150]: info:
determine_online_status: Node bh130 is online
Jan  1 00:15:36 bh130 daemon.notice pengine: [1150]: notice:
native_print: bhApp    (ocf::heartbeat:bhApp):    Stopped
Jan  1 00:15:36 bh130 daemon.notice pengine: [1150]: notice:
RecurringOp:  Start recurring monitor (240s) for bhApp on bh130
Jan  1 00:15:36 bh130 daemon.notice pengine: [1150]: notice: LogActions:
Start bhApp    (bh130)
Jan  1 00:15:37 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Jan  1 00:15:37 bh130 daemon.info crmd: [1148]: info: unpack_graph:
Unpacked transition 0: 5 actions in 5 synapses
Jan  1 00:15:37 bh130 daemon.info crmd: [1148]: info: do_te_invoke:
Processing graph 0 (ref=pe_calc-dc-936-7) derived from
/usr/var/lib/pengine/pe-input-34.bz2
Jan  1 00:15:37 bh130 daemon.info crmd: [1148]: info: te_rsc_command:
Initiating action 4: monitor bhApp_monitor_0 on bh130 (local)
Jan  1 00:15:37 bh130 daemon.info crmd: [1148]: info: do_lrm_rsc_op:
Performing key=4:0:7:fb4c1819-0064-4651-a330-a2071dc4e495
op=bhApp_monitor_0 )
Jan  1 00:15:37 bh130 daemon.info lrmd: [1145]: info: rsc:bhApp:2: probe
Jan  1 00:15:37 bh130 daemon.info attrd: [1147]: info:
attrd_ha_callback: flush message from bh130
Jan  1 00:15:37 bh130 daemon.info attrd: [1147]: info:
attrd_ha_callback: flush message from bh130
Jan  1 00:15:38 bh130 daemon.info pengine: [1150]: info:
process_pe_message: Transition 0: PEngine Input stored in:
/usr/var/lib/pengine/pe-input-34.bz2
Jan  1 00:15:38 bh130 daemon.info cib: [1151]: info: write_cib_contents:
Archived previous version as /usr/var/lib/heartbeat/crm/cib-93.raw
Jan  1 00:15:38 bh130 daemon.info cib: [1151]: info: write_cib_contents:
Wrote version 0.131.0 of the CIB to disk (digest:
48bbfe23fc163667906a454657f9b229)
Jan  1 00:15:38 bh130 daemon.info cib: [1151]: info: retrieveCib:
Reading cluster configuration from:
/usr/var/lib/heartbeat/crm/cib.RL9oHN (digest:
/usr/var/lib/heartbeat/crm/cib.ypcVer)
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: process_lrm_event:
LRM operation bhApp_monitor_0 (call=2, rc=7, cib-update=25,
confirmed=true) not running
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: match_graph_event:
Action bhApp_monitor_0 (4) confirmed on bh130 (rc=0)
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: te_rsc_command:
Initiating action 3: probe_complete probe_complete on bh130 (local) - no
waiting
Jan  1 00:15:38 bh130 daemon.info attrd: [1147]: info: find_hash_entry:
Creating hash entry for probe_complete
Jan  1 00:15:38 bh130 daemon.info attrd: [1147]: info:
attrd_trigger_update: Sending flush op to all hosts for: probe_complete
(true)
Jan  1 00:15:38 bh130 daemon.info attrd: [1147]: info:
attrd_perform_update: Sent update 8: probe_complete=true
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: te_pseudo_action:
Pseudo action 2 fired and confirmed
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
abort_transition_graph: te_update_diff:146 - Triggered transition abort
(complete=0, tag=transient_attributes,
id=4e5638b3-0baf-44f3-96bc-f8efd6c10595, magic=NA, cib=0.131.3) :
Transient attribute: update
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
update_abort_priority: Abort priority upgraded from 0 to 1000000
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
update_abort_priority: Abort action done superceeded by restart
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: run_graph:
====================================================
Jan  1 00:15:38 bh130 daemon.notice crmd: [1148]: notice: run_graph:
Transition 0 (Complete=3, Pending=0, Fired=0, Skipped=2, Incomplete=0,
Source=/usr/var/lib/pengine/pe-input-34.bz2): Stopped
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: te_graph_trigger:
Transition 0 is now complete
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_TRANSITION_ENGINE ->
S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
do_state_transition: All 1 cluster nodes are eligible to run resources.
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: do_pe_invoke:
Query 26: Requesting the current CIB: S_POLICY_ENGINE
Jan  1 00:15:38 bh130 daemon.notice pengine: [1150]: notice:
unpack_config: On loss of CCM Quorum: Ignore
Jan  1 00:15:38 bh130 daemon.info pengine: [1150]: info: unpack_config:
Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
Jan  1 00:15:38 bh130 daemon.info pengine: [1150]: info:
determine_online_status: Node bh130 is online
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
do_pe_invoke_callback: Invoking the PE: query=26, ref=pe_calc-dc-938-10,
seq=1, quorate=1
Jan  1 00:15:38 bh130 daemon.notice pengine: [1150]: notice:
native_print: bhApp    (ocf::heartbeat:bhApp):    Stopped
Jan  1 00:15:38 bh130 daemon.notice pengine: [1150]: notice:
RecurringOp:  Start recurring monitor (240s) for bhApp on bh130
Jan  1 00:15:38 bh130 daemon.notice pengine: [1150]: notice: LogActions:
Start bhApp    (bh130)
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=handle_response ]
Jan  1 00:15:38 bh130 daemon.info crmd: [1148]: info: unpack_graph:
Unpacked transition 1: 2 actions in 2 synapses
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: do_te_invoke:
Processing graph 1 (ref=pe_calc-dc-938-10) derived from
/usr/var/lib/pengine/pe-input-35.bz2
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: te_rsc_command:
Initiating action 4: start bhApp_start_0 on bh130 (local)
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: do_lrm_rsc_op:
Performing key=4:1:0:fb4c1819-0064-4651-a330-a2071dc4e495 op=bhApp_start_0 )
Jan  1 00:15:39 bh130 daemon.info lrmd: [1145]: info: rsc:bhApp:3: start
Jan  1 00:15:39 bh130 daemon.info pengine: [1150]: info:
process_pe_message: Transition 1: PEngine Input stored in:
/usr/var/lib/pengine/pe-input-35.bz2
Jan  1 00:15:39 bh130 daemon.info lrmd: [1145]: info: RA output:
(bhApp:start:stderr) sh: you need to specify whom to kill
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: process_lrm_event:
LRM operation bhApp_start_0 (call=3, rc=0, cib-update=27, confirmed=true) ok
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: match_graph_event:
Action bhApp_start_0 (4) confirmed on bh130 (rc=0)
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: te_rsc_command:
Initiating action 5: monitor bhApp_monitor_240000 on bh130 (local)
Jan  1 00:15:39 bh130 daemon.info crmd: [1148]: info: do_lrm_rsc_op:
Performing key=5:1:0:fb4c1819-0064-4651-a330-a2071dc4e495
op=bhApp_monitor_240000 )
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info: process_lrm_event:
LRM operation bhApp_monitor_240000 (call=4, rc=0, cib-update=28,
confirmed=false) ok
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info: match_graph_event:
Action bhApp_monitor_240000 (5) confirmed on bh130 (rc=0)
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info: run_graph:
====================================================
Jan  1 00:15:40 bh130 daemon.notice crmd: [1148]: notice: run_graph:
Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/usr/var/lib/pengine/pe-input-35.bz2): Complete
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info: te_graph_trigger:
Transition 1 is now complete
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info: notify_crmd:
Transition 1 status: done - <null>
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jan  1 00:15:40 bh130 daemon.info crmd: [1148]: info:
do_state_transition: Starting PEngine Recheck Timer
Jan  1 00:15:45 bh130 daemon.crit crmd: [1148]: CRIT:
lrm_connection_destroy: LRM Connection failed
Jan  1 00:15:45 bh130 daemon.info crmd: [1148]: info:
lrm_connection_destroy: LRM Connection disconnected
Jan  1 00:15:45 bh130 daemon.err crmd: [1148]: ERROR: do_log: FSA: Input
I_ERROR from lrm_connection_destroy() received in state S_IDLE
Jan  1 00:15:45 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_IDLE -> S_RECOVERY [
input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Jan  1 00:15:45 bh130 daemon.err crmd: [1148]: ERROR: do_recover: Action
A_RECOVER (0000000001000000) not supported
Jan  1 00:15:45 bh130 daemon.warn crmd: [1148]: WARN: do_election_vote:
Not voting in election, we're in state S_RECOVERY
Jan  1 00:15:45 bh130 daemon.info crmd: [1148]: info: do_dc_release: DC
role released
Jan  1 00:15:45 bh130 daemon.info pengine: [1150]: info:
crm_signal_dispatch: Invoking handler for signal 15: Terminated
Jan  1 00:15:45 bh130 daemon.warn lrmd: [1171]: WARN: Initializing
connection to logging daemon failed. Logging daemon may not be running
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info:
G_main_add_SignalHandler: Added signal handler for signal 15
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info:
G_main_add_SignalHandler: Added signal handler for signal 17
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info: enabling coredumps
Jan  1 00:15:45 bh130 daemon.warn lrmd: [1171]: WARN: Core dumps could
be lost if multiple dumps occur.
Jan  1 00:15:45 bh130 daemon.warn lrmd: [1171]: WARN: Consider setting
non-default value in /proc/sys/kernel/core_pattern (or equivalent) for
maximum supportability
Jan  1 00:15:45 bh130 daemon.warn lrmd: [1171]: WARN: Consider setting
/proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
supportability
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info:
G_main_add_SignalHandler: Added signal handler for signal 10
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info:
G_main_add_SignalHandler: Added signal handler for signal 12
Jan  1 00:15:45 bh130 daemon.info lrmd: [1171]: info: Started.
Jan  1 00:15:45 bh130 daemon.info crmd: [1148]: info: stop_subsystem:
Sent -TERM to pengine: [1150]
Jan  1 00:15:45 bh130 daemon.info crmd: [1148]: info: do_te_control:
Transitioner is now inactive
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_te_control:
Disconnecting STONITH...
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info:
tengine_stonith_connection_destroy: Fencing daemon disconnected
Jan  1 00:15:46 bh130 daemon.notice crmd: [1148]: notice: Not currently
connected.
Jan  1 00:15:46 bh130 daemon.err crmd: [1148]: ERROR: do_log: FSA: Input
I_TERMINATE from do_recover() received in state S_RECOVERY
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info:
do_state_transition: State transition S_RECOVERY -> S_TERMINATE [
input=I_TERMINATE cause=C_FSA_INTERNAL origin=do_recover ]
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown:
Terminating the pengine
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: stop_subsystem:
Sent -TERM to pengine: [1150]
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown:
Waiting for subsystems to exit
Jan  1 00:15:46 bh130 daemon.warn crmd: [1148]: WARN:
register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown: All
subsystems stopped, continuing
Jan  1 00:15:46 bh130 daemon.warn crmd: [1148]: WARN: do_log: FSA: Input
I_PENDING from do_election_vote() received in state S_TERMINATE
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown:
Terminating the pengine
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: stop_subsystem:
Sent -TERM to pengine: [1150]
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown:
Waiting for subsystems to exit
Jan  1 00:15:46 bh130 daemon.warn crmd: [1148]: WARN:
register_fsa_input_adv: do_shutdown stalled the FSA with pending inputs
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown: All
subsystems stopped, continuing
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info:
crmdManagedChildDied: Process pengine:[1150] exited (signal=0, exitcode=0)
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: pe_msg_dispatch:
Received HUP from pengine:[1150]
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info:
pe_connection_destroy: Connection to the Policy Engine released
Jan  1 00:15:46 bh130 daemon.warn crmd: [1148]: WARN: do_log: FSA: Input
I_RELEASE_SUCCESS from do_dc_release() received in state S_TERMINATE
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_shutdown: All
subsystems stopped, continuing
Jan  1 00:15:46 bh130 daemon.info ccm: [1143]: info: client (pid=1148)
removed from ccm
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_ha_control:
Disconnected from Heartbeat
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_cib_control:
Disconnecting CIB
Jan  1 00:15:46 bh130 daemon.info cib: [1144]: info:
cib_process_readwrite: We are now in R/O mode
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info:
crmd_cib_connection_destroy: Connection to the CIB terminated...
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_exit:
Performing A_EXIT_0 - gracefully exiting the CRMd
Jan  1 00:15:46 bh130 daemon.err crmd: [1148]: ERROR: do_exit: Could not
recover from internal error
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: free_mem: Dropping
I_TERMINATE: [ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Jan  1 00:15:46 bh130 daemon.info crmd: [1148]: info: do_exit: [crmd]
stopped (2)


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] stderr reboot

Reply via email to