[Linux-HA] file system resource becomes inaccesible when any of the node goes down

Muhammad Sharfuddin Sun, 05 Jul 2015 09:14:56 -0700

SLES 11 SP 3 + online updates(pacemaker-1.1.11-0.8.11.70openais-1.1.4-5.22.1.7)

Its a dual primary drbd cluster, which mounts a file system resource onboth the cluster nodes simultaneously(file system type is ocfs2).

Whenever any of the nodes goes down, the file system(/sharedata) becomeinaccessible for exact 35 seconds on the other (surviving/online) node,and then become available again on the online node.

Please help me understand why the node which survives or remains onlineunable to access the file system resource(/sharedata) for 35 seconds ?and how can I fix the cluster so that file system remains accessible onthe surviving node without any interruption/delay(as in my case of about35 seconds)

By inaccessible, I meant to say that running "ls -l /sharedata" and "df/sharedata" does not return any output and does not return the promptback on the online node for exact 35 seconds once the other node becomesoffline.

e.g "node1" got offline somewhere around 01:37:15, and then /sharedatafile system was inaccessible during 01:37:35 and 01:38:18 on the onlinenode i.e "node2".



/var/log/messages on node2, when node1 went offline:

Jul 5 01:37:26 node2 kernel: [ 675.255865] drbd r0: PingAck did notarrive in time.Jul 5 01:37:26 node2 kernel: [ 675.255886] drbd r0: peer( Primary ->Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )Jul 5 01:37:26 node2 kernel: [ 675.256030] block drbd0: new currentUUID C23D1458962AD18D:A8DD404C9F563391:6A5F4A26F64BAF0B:6A5E4A26F64BAF0B

Jul  5 01:37:26 node2 kernel: [  675.256079] drbd r0: asender terminated
Jul  5 01:37:26 node2 kernel: [  675.256081] drbd r0: Terminating drbd_a_r0
Jul  5 01:37:26 node2 kernel: [  675.256306] drbd r0: Connection closed

Jul 5 01:37:26 node2 kernel: [ 675.256338] drbd r0: conn(NetworkFailure -> Unconnected )

Jul  5 01:37:26 node2 kernel: [  675.256339] drbd r0: receiver terminated

Jul 5 01:37:26 node2 kernel: [ 675.256340] drbd r0: Restartingreceiver thread

Jul  5 01:37:26 node2 kernel: [  675.256341] drbd r0: receiver (re)started

Jul 5 01:37:26 node2 kernel: [ 675.256344] drbd r0: conn( Unconnected-> WFConnection )Jul 5 01:37:29 node2 corosync[4040]: [TOTEM ] A processor failed,forming new configuration.

Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] CLM CONFIGURATION CHANGE
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] New Configuration:
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ]     r(0) ip(172.16.241.132)
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] Members Left:
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ]     r(0) ip(172.16.241.131)
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] Members Joined:

Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] notice:pcmk_peer_update: Transitional membership event on ring 216: memb=1,new=0, lost=1Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update:memb: node2 739307908Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update:lost: node1 739307907

Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] CLM CONFIGURATION CHANGE
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] New Configuration:
Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ]     r(0) ip(172.16.241.132)

Jul 5 01:37:35 node2 cluster-dlm[4344]: notice:plugin_handle_membership: Membership 216: quorum lostJul 5 01:37:35 node2 ocfs2_controld[4473]: notice:plugin_handle_membership: Membership 216: quorum lost

Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] Members Left:

Jul 5 01:37:35 node2 crmd[4050]: notice: plugin_handle_membership:Membership 216: quorum lostJul 5 01:37:35 node2 stonith-ng[4046]: notice:plugin_handle_membership: Membership 216: quorum lostJul 5 01:37:35 node2 cib[4045]: notice: plugin_handle_membership:Membership 216: quorum lostJul 5 01:37:35 node2 cluster-dlm[4344]: notice:crm_update_peer_state: plugin_handle_membership: Node node1[739307907] -state is now lost (was member)Jul 5 01:37:35 node2 ocfs2_controld[4473]: notice:crm_update_peer_state: plugin_handle_membership: Node node1[739307907] -state is now lost (was member)

Jul  5 01:37:35 node2 corosync[4040]:  [CLM   ] Members Joined:

Jul 5 01:37:35 node2 crmd[4050]: warning: match_down_event: No matchfor shutdown action on node1Jul 5 01:37:35 node2 stonith-ng[4046]: notice: crm_update_peer_state:plugin_handle_membership: Node node1[739307907] - state is now lost (wasmember)Jul 5 01:37:35 node2 cib[4045]: notice: crm_update_peer_state:plugin_handle_membership: Node node1[739307907] - state is now lost (wasmember)Jul 5 01:37:35 node2 cluster-dlm[4344]: update_cluster: Processingmembership 216

Jul  5 01:37:35 node2 ocfs2_controld[4473]: confchg called

Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] notice:pcmk_peer_update: Stable membership event on ring 216: memb=1, new=0, lost=0Jul 5 01:37:35 node2 crmd[4050]: notice: peer_update_callback:Stonith/shutdown of node1 not matchedJul 5 01:37:35 node2 cluster-dlm[4344]: dlm_process_node: Skippedactive node 739307908: born-on=204, last-seen=216, this-event=216,last-event=212Jul 5 01:37:35 node2 ocfs2_controld[4473]: ocfs2_controld (group"ocfs2:controld") confchg: members 1, left 1, joined 0Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update:MEMB: node2 739307908Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_peer_state:plugin_handle_membership: Node node1[739307907] - state is now lost (wasmember)Jul 5 01:37:35 node2 cluster-dlm[4344]: del_configfs_node:del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/739307907"

Jul  5 01:37:35 node2 ocfs2_controld[4473]: node daemon left 739307907

Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info:ais_mark_unseen_peer_dead: Node node1 was not seen in the previoustransitionJul 5 01:37:35 node2 crmd[4050]: warning: match_down_event: No matchfor shutdown action on node1Jul 5 01:37:35 node2 cluster-dlm[4344]: dlm_process_node: Removedinactive node 739307907: born-on=212, last-seen=212, this-event=216,last-event=212

Jul  5 01:37:35 node2 ocfs2_controld[4473]: node down 739307907

Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: update_member:Node 739307907/node1 is now: lostJul 5 01:37:35 node2 crmd[4050]: notice: peer_update_callback:Stonith/shutdown of node1 not matchedJul 5 01:37:35 node2 cluster-dlm[4344]: log_config: dlm:controld conf 10 1 memb 739307908 join left 739307907Jul 5 01:37:35 node2 ocfs2_controld[4473]: Node 739307907 has leftmountgroup 12925001B70F4A529491DFF16F8B8352Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info:send_member_notification: Sending membership update 216 to 5 children

Jul  5 01:37:35 node2 sbd: [4028]: WARN: CIB: We do NOT have quorum!
Jul  5 01:37:35 node2 sbd: [4026]: WARN: Pacemaker health check: UNHEALTHY

Jul 5 01:37:35 node2 cluster-dlm[4344]: log_config:dlm:ls:12925001B70F4A529491DFF16F8B8352 conf 1 0 1 memb 739307908 joinleft 739307907Jul 5 01:37:35 node2 corosync[4040]: [TOTEM ] A processor joined orleft the membership and a new membership was formed.Jul 5 01:37:35 node2 corosync[4040]: [CPG ] chosen downlist: senderr(0) ip(172.16.241.132) ; members(old:2 left:1)Jul 5 01:37:35 node2 cluster-dlm[4344]: node_history_fail:12925001B70F4A529491DFF16F8B8352 check_fs nodeid 739307907 setJul 5 01:37:35 node2 cluster-dlm[4344]: add_change:12925001B70F4A529491DFF16F8B8352 add_change cg 4 remove nodeid 739307907reason 3Jul 5 01:37:35 node2 corosync[4040]: [MAIN ] Completed servicesynchronization, ready to provide service.Jul 5 01:37:35 node2 cluster-dlm[4344]: add_change:12925001B70F4A529491DFF16F8B8352 add_change cg 4 counts member 1 joined0 remove 1 failed 1Jul 5 01:37:35 node2 cluster-dlm[4344]: stop_kernel:12925001B70F4A529491DFF16F8B8352 stop_kernel cg 4Jul 5 01:37:35 node2 ocfs2_controld[4473]: Sending notification of node739307907 for "12925001B70F4A529491DFF16F8B8352"

Jul  5 01:37:35 node2 ocfs2_controld[4473]: confchg called

Jul 5 01:37:35 node2 cluster-dlm[4344]: do_sysfs: write "0" to"/sys/kernel/dlm/12925001B70F4A529491DFF16F8B8352/control"Jul 5 01:37:35 node2 ocfs2_controld[4473]: group"ocfs2:12925001B70F4A529491DFF16F8B8352" confchg: members 1, left 1,joined 0Jul 5 01:37:35 node2 kernel: [ 684.452042] dlm: closing connection tonode 739307907Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_quorum: Updatingquorum status to false (call=114)Jul 5 01:37:35 node2 cluster-dlm[4344]: fence_node_time: Node739307907/node1 has not been shot yetJul 5 01:37:35 node2 crmd[4050]: notice: do_state_transition: Statetransition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOINcause=C_FSA_INTERNAL origin=check_join_state ]Jul 5 01:37:35 node2 cluster-dlm[4344]: check_fencing_done:12925001B70F4A529491DFF16F8B8352 check_fencing 739307907 wait add1436042043 fail 1436042255 last 0Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_quorum: Updatingquorum status to false (call=119)Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_local_callback:Sending full refresh (origin=crmd)Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update:Sending flush op to all hosts for: shutdown (0)Jul 5 01:37:35 node2 pengine[4049]: notice: unpack_config: On loss ofCCM Quorum: IgnoreJul 5 01:37:35 node2 pengine[4049]: warning: pe_fence_node: Node node1will be fenced because the node is no longer part of the clusterJul 5 01:37:35 node2 pengine[4049]: warning: determine_online_status:Node node1 is uncleanJul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_demote_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_demote_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update:Sending flush op to all hosts for: master-p-drbd (10000)Jul 5 01:37:35 node2 cluster-dlm[4344]: set_fs_notified:12925001B70F4A529491DFF16F8B8352 set_fs_notified nodeid 739307907Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_demote_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update:Sending flush op to all hosts for: probe_complete (true)

Jul  5 01:37:35 node2 ocfs2_controld[4473]: message from dlmcontrol

Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 ocfs2_controld[4473]: Notified for"12925001B70F4A529491DFF16F8B8352", node 739307907, status 0Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_demote_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 ocfs2_controld[4473]: Completing notification on"12925001B70F4A529491DFF16F8B8352" for node 739307907Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-drbd:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-dlm:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-dlm:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-o2cb:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-o2cb:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-vg1:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-vg1:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-sharefs:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Actionp-sharefs:1_stop_0 on node1 is unrunnable (offline)Jul 5 01:37:35 node2 pengine[4049]: warning: stage6: Scheduling Nodenode1 for STONITHJul 5 01:37:35 node2 pengine[4049]: error: rsc_expand_action:Couldn't expand cl-lock_demote_0Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:37:35 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:37:35 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:37:35 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:37:35 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Demotep-drbd:1 (Master -> Stopped node1)Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stopp-dlm:1 (node1)Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stopp-o2cb:1 (node1)Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stopp-vg1:1 (node1)Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stopp-sharefs:1 (node1)Jul 5 01:37:35 node2 pengine[4049]: warning: process_pe_message:Calculated Transition 6: /var/lib/pacemaker/pengine/pe-warn-5.bz2Jul 5 01:37:35 node2 crmd[4050]: notice: do_te_invoke: Processinggraph 6 (ref=pe_calc-dc-1436042255-74) derived from/var/lib/pacemaker/pengine/pe-warn-5.bz2Jul 5 01:37:35 node2 crmd[4050]: notice: te_fence_node: Executingreboot fencing operation (79) on node1 (timeout=90000)Jul 5 01:37:35 node2 crmd[4050]: notice: te_rsc_command: Initiatingaction 95: notify p-drbd_pre_notify_demote_0 on node2 (local)Jul 5 01:37:35 node2 stonith-ng[4046]: notice: handle_request: Clientcrmd.4050.3c8ccdb0 wants to fence (reboot) 'node1' with device '(any)'Jul 5 01:37:35 node2 stonith-ng[4046]: notice:initiate_remote_stonith_op: Initiating remote operation reboot fornode1: 2fce4697-65d5-40d2-b5b4-5c10be4d3d5a (0)Jul 5 01:37:35 node2 crmd[4050]: notice: process_lrm_event: LRMoperation p-drbd_notify_0 (call=54, rc=0, cib-update=0, confirmed=true) okJul 5 01:37:35 node2 stonith-ng[4046]: notice:can_fence_host_with_device: sbd_stonith can fence node1: dynamic-listJul 5 01:37:35 node2 sbd: [6197]: info: Delivery process handling/dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382dfbdJul 5 01:37:35 node2 sbd: [6197]: info: Device UUID:4e398182-4894-4f11-b84e-6c6ad4c02614

Jul  5 01:37:35 node2 sbd: [6197]: info: Writing reset to node slot node1
Jul  5 01:37:35 node2 sbd: [6197]: info: Messaging delay: 40

Jul 5 01:38:15 node2 sbd: [6197]: info: reset successfully delivered tonode1

Jul  5 01:38:15 node2 sbd: [6196]: info: Message successfully delivered.

Jul 5 01:38:16 node2 stonith-ng[4046]: notice: log_operation:Operation 'reboot' [6185] (call 3 from crmd.4050) for host 'node1' withdevice 'sbd_stonith' returned: 0 (OK)Jul 5 01:38:16 node2 stonith-ng[4046]: notice: remote_op_done:Operation reboot of node1 by node2 for crmd.4050@node2.2fce4697: OKJul 5 01:38:16 node2 crmd[4050]: notice: tengine_stonith_callback:Stonith operation 3/79:6:0:34e9a1ca-ebc9-4114-b5b7-26e670dc6aa0: OK (0)Jul 5 01:38:16 node2 crmd[4050]: notice: tengine_stonith_notify: Peernode1 was terminated (reboot) by node2 for node2: OK(ref=2fce4697-65d5-40d2-b5b4-5c10be4d3d5a) by client crmd.4050Jul 5 01:38:16 node2 crmd[4050]: notice: te_rsc_command: Initiatingaction 96: notify p-drbd_post_notify_demote_0 on node2 (local)Jul 5 01:38:16 node2 crmd[4050]: notice: process_lrm_event: LRMoperation p-drbd_notify_0 (call=55, rc=0, cib-update=0, confirmed=true) okJul 5 01:38:16 node2 crmd[4050]: notice: run_graph: Transition 6(Complete=24, Pending=0, Fired=0, Skipped=6, Incomplete=5,Source=/var/lib/pacemaker/pengine/pe-warn-5.bz2): StoppedJul 5 01:38:16 node2 pengine[4049]: notice: unpack_config: On loss ofCCM Quorum: IgnoreJul 5 01:38:16 node2 pengine[4049]: error: rsc_expand_action:Couldn't expand cl-lock_demote_0Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:38:16 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:38:16 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:38:16 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort:clone_update_actions_interleave: Triggered assert at clone.c:1260 :first_action != NULL || is_set(first_child->flags, pe_rsc_orphan)Jul 5 01:38:16 node2 pengine[4049]: error:clone_update_actions_interleave: No action found for demote in g-lock:0(first)Jul 5 01:38:16 node2 pengine[4049]: notice: process_pe_message:Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-30.bz2Jul 5 01:38:16 node2 crmd[4050]: notice: do_te_invoke: Processinggraph 7 (ref=pe_calc-dc-1436042296-79) derived from/var/lib/pacemaker/pengine/pe-input-30.bz2Jul 5 01:38:16 node2 crmd[4050]: notice: run_graph: Transition 7(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,Source=/var/lib/pacemaker/pengine/pe-input-30.bz2): CompleteJul 5 01:38:16 node2 crmd[4050]: notice: do_state_transition: Statetransition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESScause=C_FSA_INTERNAL origin=notify_crmd ]Jul 5 01:38:17 node2 cluster-dlm[4344]: fence_node_time: Node739307907/node1 was last shot 'now'Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fencing_done:12925001B70F4A529491DFF16F8B8352 check_fencing 739307907 done add1436042043 fail 1436042255 last 1436042297Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fencing_done:12925001B70F4A529491DFF16F8B8352 check_fencing doneJul 5 01:38:17 node2 cluster-dlm[4344]: check_quorum_done:12925001B70F4A529491DFF16F8B8352 check_quorum disabledJul 5 01:38:17 node2 cluster-dlm[4344]: check_fs_done:12925001B70F4A529491DFF16F8B8352 check_fs nodeid 739307907 clearJul 5 01:38:17 node2 cluster-dlm[4344]: check_fs_done:12925001B70F4A529491DFF16F8B8352 check_fs doneJul 5 01:38:17 node2 cluster-dlm[4344]: send_info:12925001B70F4A529491DFF16F8B8352 send_start cg 4 flags 2 data2 0 counts3 1 0 1 1Jul 5 01:38:17 node2 cluster-dlm[4344]: receive_start:12925001B70F4A529491DFF16F8B8352 receive_start 739307908:4 len 76Jul 5 01:38:17 node2 cluster-dlm[4344]: match_change:12925001B70F4A529491DFF16F8B8352 match_change 739307908:4 matches cg 4Jul 5 01:38:17 node2 cluster-dlm[4344]: wait_messages_done:12925001B70F4A529491DFF16F8B8352 wait_messages cg 4 got all 1Jul 5 01:38:17 node2 cluster-dlm[4344]: start_kernel:12925001B70F4A529491DFF16F8B8352 start_kernel cg 4 member_count 1Jul 5 01:38:17 node2 cluster-dlm[4344]: update_dir_members: dir_member739307907Jul 5 01:38:17 node2 cluster-dlm[4344]: update_dir_members: dir_member739307908Jul 5 01:38:17 node2 cluster-dlm[4344]: set_configfs_members:set_members rmdir"/sys/kernel/config/dlm/cluster/spaces/12925001B70F4A529491DFF16F8B8352/nodes/739307907"Jul 5 01:38:17 node2 cluster-dlm[4344]: do_sysfs: write "1" to"/sys/kernel/dlm/12925001B70F4A529491DFF16F8B8352/control"Jul 5 01:38:17 node2 kernel: [ 726.576130](ocfs2rec,6152,0):ocfs2_replay_journal:1549 Recovering node 739307907from slot 0 on device (253,0)Jul 5 01:38:18 node2 kernel: [ 727.453597](ocfs2rec,6152,0):ocfs2_begin_quota_recovery:407 Beginning quotarecovery in slot 0Jul 5 01:38:18 node2 kernel: [ 727.455965](kworker/u:28,139,0):ocfs2_finish_quota_recovery:599 Finishing quotarecovery in slot 0



cib:
primitive p-dlm ocf:pacemaker:controld \
        op monitor interval="120" timeout="30" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100"
primitive p-drbd ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="50" role="Master" timeout="30" \
        op monitor interval="60" role="Slave" timeout="30" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100"
primitive p-o2cb ocf:ocfs2:o2cb \
        op monitor interval="120" timeout="30" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="100"
primitive p-sharefs ocf:heartbeat:Filesystem \

params device="/dev/vg1/sharelv" directory="/sharedata"fstype="ocfs2" \

        op monitor interval="60s" timeout="60s" \
        op start interval="0" timeout="90s" \
        op stop interval="0" timeout="90s"
primitive p-vg1 ocf:heartbeat:LVM \
        params volgrpname="vg1" \
        op monitor interval="60s" timeout="40" \
        op start interval="0" timeout="40" \
        op stop interval="0" timeout="40"
primitive sbd_stonith stonith:external/sbd \
        meta target-role="Started" \
        op monitor interval="3000" timeout="20" \
        op start interval="0" timeout="20" \
        op stop interval="0" timeout="20" \

paramssbd_device="/dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382dfbd"

group g-lock p-dlm p-o2cb
group g-sharedata p-vg1 p-sharefs
ms ms-drbd p-drbd \
        meta master-max="2" clone-max="2" notify="true" interleave="true"
clone cl-lock g-lock \
        meta globally-unique="false" interleave="true"
clone cl-sharedata g-sharedata \
        meta globally-unique="false" interleave="true"
colocation col-drbd-lock inf: cl-lock ms-drbd:Master
colocation col-lock-sharedata inf: cl-sharedata cl-lock
order ord-drbd-lock inf: ms-drbd:promote cl-lock
order ord-lock-sharedata inf: cl-lock cl-sharedata
        stonith-timeout="90s" \
        no-quorum-policy="ignore" \
        expected-quorum-votes="2"


--
Regards,

Muhammad Sharfuddin
_______________________________________________
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
_______________________________________________
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

[Linux-HA] file system resource becomes inaccesible when any of the node goes down

Reply via email to