I wouldn't expect resources to be
released in this scenario though. What is the right thing to do here?
Do I need to run a quorumd server?
Our communication settings from ha.cf are:
baud 19200
serial /dev/ttyS0
ucast bond1 192.168.50.1
Here's some of the logs:
pengine[10338]: 2008/05/19_10:36:51 info: determine_online_status:
Node
dbnya1.mycompany.com is shutting down
pengine[10338]: 2008/05/19_10:36:51 notice: group_print: Resource
Group:
pgsql.myapp.group
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.ip (heartbeat::ocf:IPaddr): Started
dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.fsData (heartbeat::ocf:Filesystem): Started
dbnya2.mycompany.
com
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.fsTxnLog (heartbeat::ocf:Filesystem): Started
dbnya2.mycompany.
com
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.nfslock (lsb:nfslock): Started dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.nfs (lsb:nfs-mycompany): Started
dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:51 notice: native_print:
pgsql.myapp.pgsql (heartbeat::ocf:pgsql): Started dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.ip (dbnya2.mycompany.com)
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.fsData (dbnya2.mycompany.com)
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.fsTxnLog (dbnya2.mycompany.com)
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.nfslock (dbnya2.mycompany.com)
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.nfs (dbnya2.mycompany.com)
pengine[10338]: 2008/05/19_10:36:51 notice: NoRoleChange: Leave
resource
pgsql.myapp.pgsql (dbnya2.mycompany.com)
crmd[8999]: 2008/05/19_10:36:51 info: do_state_transition: State
transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_
IPC_MESSAGE origin=route_message ]
pengine[10338]: 2008/05/19_10:36:51 info: stage6: Scheduling Node
dbnya1.mycompany.com for shutdown
tengine[10337]: 2008/05/19_10:36:51 info: unpack_graph: Unpacked
transition
3: 1 actions in 1 synapses
tengine[10337]: 2008/05/19_10:36:51 info: te_crm_command: Executing
crm-event (22): do_shutdown on dbnya1.mycompany.com
pengine[10338]: 2008/05/19_10:36:51 info: process_pe_message:
Transition 3:
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-143.bz2
crmd[8999]: 2008/05/19_10:36:51 notice: crmd_client_status_callback:
Status
update: Client dbnya1.mycompany.com/crmd now has status [offline]
cib[8995]: 2008/05/19_10:36:52 info: cib_process_shutdown_req:
Shutdown REQ
from dbnya1.mycompany.com
cib[8995]: 2008/05/19_10:36:52 info: cib_client_status_callback:
Status
update: Client dbnya1.mycompany.com/cib now has status [leave]
crmd[8999]: 2008/05/19_10:36:52 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
cib[8995]: 2008/05/19_10:36:52 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
crmd[8999]: 2008/05/19_10:36:52 info: mem_handle_event: no mbr_track
info
cib[8995]: 2008/05/19_10:36:52 info: mem_handle_event: no mbr_track
info
crmd[8999]: 2008/05/19_10:36:52 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
tengine[10337]: 2008/05/19_10:36:52 info: update_abort_priority: Abort
priority upgraded to 1000000
cib[8995]: 2008/05/19_10:36:52 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
crmd[8999]: 2008/05/19_10:36:52 info: mem_handle_event: instance=10,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
tengine[10337]: 2008/05/19_10:36:52 info: update_abort_priority: Abort
action 0 superceeded by 2
cib[8995]: 2008/05/19_10:36:52 info: mem_handle_event: instance=10,
nodes=1,
new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
crmd[8999]: 2008/05/19_10:36:52 info: crmd_ccm_msg_callback: Quorum
lost
after event=INVALID (id=10)
cib[8995]: 2008/05/19_10:36:52 info: cib_ccm_msg_callback: LOST:
dbnya1.mycompany.com
crmd[8999]: 2008/05/19_10:36:52 info: crmd_ccm_msg_callback: Quorum
lost:
triggering transition (INVALID)
cib[8995]: 2008/05/19_10:36:52 info: cib_ccm_msg_callback: PEER:
dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:36:52 info: ccm_event_detail: INVALID:
trans=10,
nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4
crmd[8999]: 2008/05/19_10:36:52 info: ccm_event_detail:
CURRENT:
dbnya2.mycompany.com [nodeid=1, born=10]
crmd[8999]: 2008/05/19_10:36:52 info: ccm_event_detail: LOST:
dbnya1.mycompany.com [nodeid=0, born=9]
tengine[10337]: 2008/05/19_10:36:52 info: run_graph: Transition 3:
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
crmd[8999]: 2008/05/19_10:36:52 info: do_state_transition: State
transition
S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC
_MESSAGE origin=route_message ]
crmd[8999]: 2008/05/19_10:36:52 info: do_state_transition: All 1
cluster
nodes are eligible to run resources.
pengine[10338]: 2008/05/19_10:36:52 WARN: cluster_status: We do not
have
quorum - fencing and resource management disabled
pengine[10338]: 2008/05/19_10:36:52 info: determine_online_status:
Node
dbnya2.mycompany.com is online
pengine[10338]: 2008/05/19_10:36:52 notice: group_print: Resource
Group:
pgsql.myapp.group
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.ip (heartbeat::ocf:IPaddr): Started
dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.fsData (heartbeat::ocf:Filesystem): Started
dbnya2.mycompany.
com
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.fsTxnLog (heartbeat::ocf:Filesystem): Started
dbnya2.mycompany.
com
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.nfslock (lsb:nfslock): Started dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.nfs (lsb:nfs-mycompany): Started
dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:52 notice: native_print:
pgsql.myapp.pgsql (heartbeat::ocf:pgsql): Started dbnya2.mycompany.com
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.ip
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.fsData
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.fsTxnLog
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.nfslock
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.nfs
pengine[10338]: 2008/05/19_10:36:52 notice: StopRsc:
dbnya2.mycompany.com
Stop pgsql.myapp.pgsql
crmd[8999]: 2008/05/19_10:36:52 info: do_state_transition: State
transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_
IPC_MESSAGE origin=route_message ]
tengine[10337]: 2008/05/19_10:36:52 info: unpack_graph: Unpacked
transition
4: 9 actions in 9 synapses
tengine[10337]: 2008/05/19_10:36:52 info: te_pseudo_action: Pseudo
action 18
fired and confirmed
tengine[10337]: 2008/05/19_10:36:52 info: send_rsc_command: Initiating
action 14: pgsql.myapp.pgsql_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:36:52 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.pgsql_stop_0
key=14:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:36:52 info: rsc:pgsql.myapp.pgsql: stop
pengine[10338]: 2008/05/19_10:36:52 info: process_pe_message:
Transition 4:
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-144.bz2
pgsql[19654][19679]: 2008/05/19_10:36:53 INFO: PostgreSQL is down
crmd[8999]: 2008/05/19_10:36:54 info: process_lrm_event: LRM operation
pgsql.myapp.pgsql_stop_0 (call=27, rc=0) complete
tengine[10337]: 2008/05/19_10:36:54 info: match_graph_event: Action
pgsql.myapp.pgsql_stop_0 (14) confirmed on dbnya2.mycompany.com (rc=0)
tengine[10337]: 2008/05/19_10:36:54 info: send_rsc_command: Initiating
action 12: pgsql.myapp.nfs_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:36:54 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.nfs_stop_0
key=12:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:36:54 info: rsc:pgsql.myapp.nfs: stop
lrmd[19683]: 2008/05/19_10:36:54 WARN: For LSB init script, no
additional
parameters are needed.
lrmd[8996]: 2008/05/19_10:36:54 info: RA output:
(pgsql.myapp.nfs:stop:stdout) Shutting down NFS mountd:
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) [
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) OK ]
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) Shutting down NFS daemon:
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) [
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) OK
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout) ]
lrmd[8996]: 2008/05/19_10:36:55 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:36:58 info: RA output:
(pgsql.myapp.nfs:stop:stdout) nfsd (pid 19578 19577 19576 19575
19574 19573
19572 19571) is run
ning...
lrmd[8996]: 2008/05/19_10:36:58 info: RA output:
(pgsql.myapp.nfs:stop:stdout) Force-killing nfs daemon:
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) [
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) OK ]
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) Shutting down NFS quotas:
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) [
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) OK ]
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) Shutting down NFS services:
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) [
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) OK
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout) ]
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfs:stop:stdout)
crmd[8999]: 2008/05/19_10:37:02 info: process_lrm_event: LRM operation
pgsql.myapp.nfs_stop_0 (call=28, rc=0) complete
tengine[10337]: 2008/05/19_10:37:02 info: match_graph_event: Action
pgsql.myapp.nfs_stop_0 (12) confirmed on dbnya2.mycompany.com (rc=0)
tengine[10337]: 2008/05/19_10:37:02 info: send_rsc_command: Initiating
action 10: pgsql.myapp.nfslock_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:37:02 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.nfslock_stop_0
key=10:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:37:02 info: rsc:pgsql.myapp.nfslock: stop
lrmd[19726]: 2008/05/19_10:37:02 WARN: For LSB init script, no
additional
parameters are needed.
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfslock:stop:stdout) Stopping NFS statd:
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfslock:stop:stdout) [
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfslock:stop:stdout) OK ]
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.nfslock:stop:stdout)
crmd[8999]: 2008/05/19_10:37:02 info: process_lrm_event: LRM operation
pgsql.myapp.nfslock_stop_0 (call=29, rc=0) complete
tengine[10337]: 2008/05/19_10:37:02 info: match_graph_event: Action
pgsql.myapp.nfslock_stop_0 (10) confirmed on dbnya2.mycompany.com
(rc=0)
tengine[10337]: 2008/05/19_10:37:02 info: send_rsc_command: Initiating
action 8: pgsql.myapp.fsTxnLog_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:37:02 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.fsTxnLog_stop_0
key=8:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:37:02 info: rsc:pgsql.myapp.fsTxnLog: stop
Filesystem[19747][19777]: 2008/05/19_10:37:02 INFO: Running stop for
/dev/mapper/md3000-txnlogp1 on /opt/data/md3000/txnlog
Filesystem[19747][19787]: 2008/05/19_10:37:02 INFO: Trying to unmount
/opt/data/md3000/txnlog
Filesystem[19747][19793]: 2008/05/19_10:37:02 INFO: unmounted
/opt/data/md3000/txnlog successfully
crmd[8999]: 2008/05/19_10:37:02 info: process_lrm_event: LRM operation
pgsql.myapp.fsTxnLog_stop_0 (call=30, rc=0) complete
tengine[10337]: 2008/05/19_10:37:02 info: match_graph_event: Action
pgsql.myapp.fsTxnLog_stop_0 (8) confirmed on dbnya2.mycompany.com
(rc=0)
tengine[10337]: 2008/05/19_10:37:02 info: send_rsc_command: Initiating
action 6: pgsql.myapp.fsData_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:37:02 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.fsData_stop_0
key=6:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:37:02 info: rsc:pgsql.myapp.fsData: stop
Filesystem[19808][19838]: 2008/05/19_10:37:02 INFO: Running stop for
/dev/mapper/md3000-datap1 on /opt/data/md3000/data
Filesystem[19808][19848]: 2008/05/19_10:37:02 INFO: Trying to unmount
/opt/data/md3000/data
Filesystem[19808][19859]: 2008/05/19_10:37:02 INFO: unmounted
/opt/data/md3000/data successfully
crmd[8999]: 2008/05/19_10:37:02 info: process_lrm_event: LRM operation
pgsql.myapp.fsData_stop_0 (call=31, rc=0) complete
tengine[10337]: 2008/05/19_10:37:02 info: match_graph_event: Action
pgsql.myapp.fsData_stop_0 (6) confirmed on dbnya2.mycompany.com (rc=0)
tengine[10337]: 2008/05/19_10:37:02 info: send_rsc_command: Initiating
action 4: pgsql.myapp.ip_stop_0 on dbnya2.mycompany.com
crmd[8999]: 2008/05/19_10:37:02 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.ip_stop_0 key=4:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
crmd[8999]: 2008/05/19_10:37:02 info: do_lrm_rsc_op: Performing
op=pgsql.myapp.ip_stop_0 key=4:4:60fbb97d-8b53-401c-85c6-7f7e57de9b00)
lrmd[8996]: 2008/05/19_10:37:02 info: rsc:pgsql.myapp.ip: stop
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.ip:stop:stdout) In IP Stop
lrmd[8996]: 2008/05/19_10:37:02 info: RA output:
(pgsql.myapp.ip:stop:stderr) SIOCDELRT: No such process
IPaddr[19869][19884]: 2008/05/19_10:37:02 INFO: ifconfig bond0:0 down
crmd[8999]: 2008/05/19_10:37:02 info: process_lrm_event: LRM operation
pgsql.myapp.ip_stop_0 (call=32, rc=0) complete
tengine[10337]: 2008/05/19_10:37:02 info: match_graph_event: Action
pgsql.myapp.ip_stop_0 (4) confirmed on dbnya2.mycompany.com (rc=0)
tengine[10337]: 2008/05/19_10:37:02 info: te_pseudo_action: Pseudo
action 19
fired and confirmed
tengine[10337]: 2008/05/19_10:37:02 info: te_pseudo_action: Pseudo
action 3
fired and confirmed
crmd[8999]: 2008/05/19_10:37:02 info: do_state_transition: State
transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSA
GE origin=route_message ]
tengine[10337]: 2008/05/19_10:37:02 info: run_graph: Transition 4:
(Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[10337]: 2008/05/19_10:37:02 info: notify_crmd: Transition 4
status:
te_complete - <null>
heartbeat[7257]: 2008/05/19_10:37:23 WARN: node
dbnya1.mycompany.com: is
dead
crmd[8999]: 2008/05/19_10:37:23 notice: crmd_ha_status_callback:
Status
update: Node dbnya1.mycompany.com now has status [dead]
heartbeat[7257]: 2008/05/19_10:37:23 info: Link
dbnya1.mycompany.com:/dev/ttyS0
dead.
heartbeat[7257]: 2008/05/19_10:37:23 info: Link
dbnya1.mycompany.com:bond1
dead.
cib[8995]: 2008/05/19_10:40:45 info: cib_stats: Processed 29
operations
(2758.00us average, 0% utilization) in the last 10min
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems