Hi All,
Have a setup where 2 nodes are synchronized by DRBD and maintained by
COROSYNC.
Here DRBD works as Primary/Primary mode.
Server1 and Server2
As I shutdown Server1, the Server2 wait for sometime and restarts by itself.
Where as the other way if Server2 is shutdown when Server1 online then
nothing happens to Server1.
Please find the log attached and the versions used for the same.
I do understand the fencing creating issue, but is there any way to disable
the same.
--
Regards,
Kamal Kishore B V
root@server2:~# cat /etc/issue && dpkg -l | egrep
"corosync|pacemaker|drbd|xen|ocfs2"
Ubuntu 12.04 LTS \n \l
ii corosync 1.4.2-2ubuntu0.1
Standards-based cluster framework (daemon and modules)
hi drbd8-utils 2:8.3.11-0ubuntu1
RAID 1 over tcp/ip for Linux utilities
ii libxen-4.1 4.1.5-0ubuntu0.12.04.3
Public libs for Xen
ii libxenstore3.0 4.1.5-0ubuntu0.12.04.3
Xenstore communications library for Xen
ii ocfs2-tools 1.6.3-4ubuntu1
tools for managing OCFS2 cluster filesystems
ii pacemaker 1.1.6-2ubuntu3.2
HA cluster resource manager
ii xen-hypervisor-4.1-amd64 4.1.5-0ubuntu0.12.04.3
Xen Hypervisor on AMD64
ii xen-utils-4.1 4.1.5-0ubuntu0.12.04.3
XEN administrative tools
ii xen-utils-common 4.1.2-1ubuntu1
XEN administrative tools - common files
ii xenstore-utils 4.1.5-0ubuntu0.12.04.3
Xenstore utilities for Xen
cat /etc/issue && dpkg -l | egrep "bridge-utils"
bridge-utils 1.5-2ubuntu7 Utilities for configuring the Linux Ethernet
bridge
cat /etc/issue && dpkg -l | egrep "ssh"
Ubuntu 12.04 LTS \n \l
ii libssh-4 0.5.2-1
tiny C SSH library
ii openssh-client 1:5.9p1-5ubuntu1.4
secure shell (SSH) client, for secure access to remote machines
ii openssh-server 1:5.9p1-5ubuntu1.4
secure shell (SSH) server, for secure access from remote machines
ii ssh-askpass-gnome 1:5.9p1-5ubuntu1
interactive X program to prompt users for a passphrase for ssh-add
ii ssh-import-id 2.10-0ubuntu1
securely retrieve an SSH public key and install it locally
cat /etc/issue && dpkg -l | egrep "vncviewer"
Ubuntu 12.04 LTS \n \l
ii xtightvncviewer 1.3.9-6.2ubuntu2
virtual network computing client software for X
Apr 18 02:49:03 server2 crmd: [1181]: WARN: check_dead_member: Our DC node
(server1) left the cluster
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=check_dead_member ]
Apr 18 02:49:03 server2 crmd: [1181]: info: update_dc: Unset DC server1
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Apr 18 02:49:03 server2 crmd: [1181]: info: do_te_control: Registering TE UUID:
11058a2e-0c23-4f0d-a911-65a5eb1b74a0
Apr 18 02:49:03 server2 crmd: [1181]: info: set_graph_functions: Setting custom
graph functions
Apr 18 02:49:03 server2 crmd: [1181]: info: unpack_graph: Unpacked transition
-1: 0 actions in 0 synapses
Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_takeover: Taking over DC
status for this partition
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_readwrite: We are now in
R/W mode
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_master for section 'all' (origin=local/crmd/17,
version=0.116.8): ok (rc=0)
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/18,
version=0.116.9): ok (rc=0)
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/20,
version=0.116.10): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: join_make_offer: Making join offers
based on membership 156200
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/22,
version=0.116.11): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_offer_all: join-1:
Waiting on 1 outstanding join acks
Apr 18 02:49:03 server2 crmd: [1181]: info: ais_dispatch_message: Membership
156200: quorum still lost
Apr 18 02:49:03 server2 crmd: [1181]: info: crmd_ais_dispatch: Setting expected
votes to 2
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/25,
version=0.116.12): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: update_dc: Set DC to server2 (3.0.5)
Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Shutdown
escalation occurs after: 1200000ms
Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Checking for
expired actions every 900000ms
Apr 18 02:49:03 server2 crmd: [1181]: info: config_query_callback: Sending
expected-votes=2 to corosync
Apr 18 02:49:03 server2 crmd: [1181]: info: ais_dispatch_message: Membership
156200: quorum still lost
Apr 18 02:49:03 server2 crmd: [1181]: info: crmd_ais_dispatch: Setting expected
votes to 2
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/28,
version=0.116.13): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: All 1 cluster
nodes responded to the join offer.
Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_finalize: join-1:
Syncing the CIB from server2 to the rest of the cluster
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_sync for section 'all' (origin=local/crmd/29,
version=0.116.13): ok (rc=0)
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/30,
version=0.116.14): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_ack: join-1: Updating
node state to member for server2
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='server2']/lrm
(origin=local/crmd/31, version=0.116.15): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='server2']/lrm": ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Apr 18 02:49:03 server2 crmd: [1181]: info: do_dc_join_final: Ensuring DC,
quorum and node attributes are up-to-date
Apr 18 02:49:03 server2 crmd: [1181]: info: crm_update_quorum: Updating quorum
status to false (call=35)
Apr 18 02:49:03 server2 crmd: [1181]: info: abort_transition_graph:
do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled
Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke: Query 36: Requesting
the current CIB: S_POLICY_ENGINE
Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_local_callback: Sending
full refresh (origin=crmd)
Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/33,
version=0.116.17): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: WARN: match_down_event: No match for
shutdown action on server1
Apr 18 02:49:03 server2 crmd: [1181]: info: te_update_diff: Stonith/shutdown of
server1 not matched
Apr 18 02:49:03 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/35,
version=0.116.19): ok (rc=0)
Apr 18 02:49:03 server2 crmd: [1181]: info: abort_transition_graph:
te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state,
id=server1, magic=NA, cib=0.116.18) : Node failure
Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke: Query 37: Requesting
the current CIB: S_POLICY_ENGINE
Apr 18 02:49:03 server2 crmd: [1181]: info: do_pe_invoke_callback: Invoking the
PE: query=37, ref=pe_calc-dc-1429305543-16, seq=156200, quorate=0
Apr 18 02:49:03 server2 attrd: [1179]: notice: attrd_trigger_update: Sending
flush op to all hosts for: master-resDRBDr1:0 (10000)
Apr 18 02:49:03 server2 pengine: [1180]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Apr 18 02:49:03 server2 pengine: [1180]: notice: RecurringOp: Start recurring
monitor (20s) for resXen1 on server2
Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Start
resXen1#011(server2)
Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave
resDRBDr1:0#011(Master server2)
Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave
resDRBDr1:1#011(Stopped)
Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave
resOCFS2r1:0#011(Started server2)
Apr 18 02:49:03 server2 pengine: [1180]: notice: LogActions: Leave
resOCFS2r1:1#011(Stopped)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Apr 18 02:49:03 server2 crmd: [1181]: info: unpack_graph: Unpacked transition
0: 2 actions in 2 synapses
Apr 18 02:49:03 server2 crmd: [1181]: info: do_te_invoke: Processing graph 0
(ref=pe_calc-dc-1429305543-16) derived from /var/lib/pengine/pe-input-1545.bz2
Apr 18 02:49:03 server2 crmd: [1181]: info: te_rsc_command: Initiating action
6: start resXen1_start_0 on server2 (local)
Apr 18 02:49:03 server2 crmd: [1181]: info: do_lrm_rsc_op: Performing
key=6:0:0:11058a2e-0c23-4f0d-a911-65a5eb1b74a0 op=resXen1_start_0 )
Apr 18 02:49:03 server2 lrmd: [1178]: info: rsc:resXen1 start[14] (pid 5929)
Apr 18 02:49:03 server2 pengine: [1180]: notice: process_pe_message: Transition
0: PEngine Input stored in: /var/lib/pengine/pe-input-1545.bz2
Apr 18 02:49:05 server2 kernel: [ 839.612059] o2net: Connection to node
server1 (num 0) at 192.168.0.91:7777 has been idle for 10.12 secs, shutting it
down.
Apr 18 02:49:05 server2 kernel: [ 839.612093] o2net: No longer connected to
node server1 (num 0) at 192.168.0.91:7777
Apr 18 02:49:05 server2 kernel: [ 839.612126]
(xend,5992,1):dlm_send_remote_convert_request:395 ERROR: Error -112 when
sending message 504 (key 0x2d50ec47) to node 0
Apr 18 02:49:05 server2 kernel: [ 839.612134] o2dlm: Waiting on the death of
node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C
Apr 18 02:49:15 server2 kernel: [ 849.628073] o2net: No connection established
with node 0 after 10.0 seconds, giving up.
Apr 18 02:49:16 server2 kernel: [ 851.344088] block drbd0: PingAck did not
arrive in time.
Apr 18 02:49:16 server2 kernel: [ 851.344101] block drbd0: peer( Primary ->
Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
susp( 0 -> 1 )
Apr 18 02:49:16 server2 kernel: [ 851.352284] block drbd0: asender terminated
Apr 18 02:49:16 server2 kernel: [ 851.352292] block drbd0: Terminating
drbd0_asender
Apr 18 02:49:16 server2 kernel: [ 851.352361] block drbd0: Connection closed
Apr 18 02:49:16 server2 kernel: [ 851.352405] block drbd0: conn(
NetworkFailure -> Unconnected )
Apr 18 02:49:16 server2 kernel: [ 851.352409] block drbd0: receiver terminated
Apr 18 02:49:16 server2 kernel: [ 851.352411] block drbd0: Restarting
drbd0_receiver
Apr 18 02:49:16 server2 kernel: [ 851.352413] block drbd0: receiver (re)started
Apr 18 02:49:16 server2 kernel: [ 851.352418] block drbd0: conn( Unconnected
-> WFConnection )
Apr 18 02:49:16 server2 kernel: [ 851.352496] block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Apr 18 02:49:16 server2 crm-fence-peer.sh[6078]: invoked for r0
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: - <cib admin_epoch="0"
epoch="116" num_updates="21" />
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <cib epoch="117"
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
crm_feature_set="3.0.5" update-origin="server2" update-client="cibadmin"
cib-last-written="Sat Apr 18 02:37:19 2015" have-quorum="0" dc-uuid="server2" >
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <configuration >
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <constraints >
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <rsc_location
rsc="msDRBDr1" id="drbd-fence-by-handler-msDRBDr1"
__crm_diff_marker__="added:top" >
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <rule
role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-msDRBDr1" >
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + <expression
attribute="#uname" operation="ne" value="server2"
id="drbd-fence-by-handler-expr-msDRBDr1" />
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </rule>
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </rsc_location>
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </constraints>
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </configuration>
Apr 18 02:49:17 server2 cib: [1177]: info: cib:diff: + </cib>
Apr 18 02:49:17 server2 crmd: [1181]: info: abort_transition_graph:
te_update_diff:124 - Triggered transition abort (complete=0, tag=diff,
id=(null), magic=NA, cib=0.117.1) : Non-status change
Apr 18 02:49:17 server2 cib: [1177]: info: cib_process_request: Operation
complete: op cib_create for section constraints (origin=local/cibadmin/2,
version=0.117.1): ok (rc=0)
Apr 18 02:49:17 server2 crmd: [1181]: info: update_abort_priority: Abort
priority upgraded from 0 to 1000000
Apr 18 02:49:17 server2 crmd: [1181]: info: update_abort_priority: Abort action
done superceeded by restart
Apr 18 02:49:17 server2 crm-fence-peer.sh[6078]: INFO peer is reachable, my
disk is UpToDate: placed constraint 'drbd-fence-by-handler-msDRBDr1'
Apr 18 02:49:17 server2 kernel: [ 852.446543] block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0 exit code 4 (0x400)
Apr 18 02:49:17 server2 kernel: [ 852.446547] block drbd0: fence-peer helper
returned 4 (peer was fenced)
Apr 18 02:49:17 server2 kernel: [ 852.446553] block drbd0: pdsk( DUnknown ->
Outdated )
Apr 18 02:49:17 server2 kernel: [ 852.446586] block drbd0: new current UUID
1F5EB0210CC2AB1B:A158E0E3B5B108B9:EFB663F37532A93A:EFB563F37532A93B
Apr 18 02:49:17 server2 kernel: [ 852.492924] block drbd0: susp( 1 -> 0 )
Apr 18 02:49:25 server2 kernel: [ 859.644076] o2net: No connection established
with node 0 after 10.0 seconds, giving up.
Apr 18 02:49:25 server2 kernel: [ 859.644112]
(xend,5992,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x2d50ec47) to node 0
Apr 18 02:49:25 server2 kernel: [ 859.644120] o2dlm: Waiting on the death of
node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C
Apr 18 02:49:30 server2 kernel: [ 865.452111] o2net: Connection to node
server1 (num 0) at 192.168.0.91:7777 shutdown, state 7
Apr 18 02:49:33 server2 kernel: [ 868.452122] o2net: Connection to node
server1 (num 0) at 192.168.0.91:7777 shutdown, state 7
Apr 18 02:49:35 server2 kernel: [ 869.660076] o2net: No connection established
with node 0 after 10.0 seconds, giving up.
Apr 18 02:49:35 server2 kernel: [ 869.660116]
(xend,5992,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when
sending message 504 (key 0x2d50ec47) to node 0
Apr 18 02:49:35 server2 kernel: [ 869.660123] o2dlm: Waiting on the death of
node 0 in domain 89D0A7DA3B9B43EDB575466F176F7A0C
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user