Hi list
I have a test ceph cluster include 3 nodes (node0: mon, node1: osd and nfs
server1, node2 osd and nfs server2).
os :centos6.6 ,kernel :3.10.94-1.el6.elrepo.x86_64, ceph version 0.94.5
I followed the http://www.sebastien-han.fr/blog/2012/07/06/nfs-over-rbd/
instructions to setup an active/standy NFS environment.
when using commands " #service corosync stop or # poweroff " on node1 , the
fail over switch situation went fine ( nfs server take over by node2). But when
I testing the situation of cutting off the power of node1, the switch is failed.
1. [root@node1 ~]# crm status
Last updated: Fri Dec 18 17:14:19 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node1 - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node1 node2 ]
Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started node1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node1
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node1
p_vip_1 (ocf::heartbeat:IPaddr): Started node1
Clone Set: clo_nfs [g_nfs]
Started: [ node1 node2 ]
2. [root@node1 ~]# service corosync stop
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:14:59 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started node2
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started node2
p_export_rbd_1 (ocf::heartbeat:exportfs): Started node2
p_vip_1 (ocf::heartbeat:IPaddr): Started node2
Clone Set: clo_nfs [g_nfs]
Started: [ node2 ]
Stopped: [ node1 ]
3. cut off node1 power manually
[root@node2 cluster]# crm status
Last updated: Fri Dec 18 17:23:06 2015
Last change: Fri Dec 18 17:13:29 2015
Stack: classic openais (with plugin)
Current DC: node2 - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 3 expected votes
8 Resources configured
Online: [ node2 ]
OFFLINE: [ node1 ]
Clone Set: clo_nfs [g_nfs]
Started: [ node2 ]
Stopped: [ node1 ]
Failed actions:
p_rbd_map_1_start_0 on node2 'unknown error' (1): call=48, status=Timed
Out, last-rc-change='Fri Dec 18 17:22:19 2015', queued=0ms, exec=20002ms
corosync.log:
Dec 18 17:22:39 [2692] node2 lrmd: warning: child_timeout_callback:
p_rbd_map_1_start_0 process (PID 11010) timed out
Dec 18 17:22:39 [2692] node2 lrmd: warning: operation_finished:
p_rbd_map_1_start_0:11010 - timed out after 20000ms
Dec 18 17:22:39 [2692] node2 lrmd: notice: operation_finished:
p_rbd_map_1_start_0:11010:stderr [ libust[11019/11019]: Warning: HOME
environment variable not set. Disabling LTTng-UST per-user tracing. (in
setup_local_apps() at lttng-ust-comm.c:305) ]
Dec 18 17:22:39 [2692] node2 lrmd: info: log_finished: finished -
rsc:p_rbd_map_1 action:start call_id:48 pid:11010 exit-code:1 exec-time:20002ms
queue-time:0ms
Dec 18 17:22:39 [2695] node2 crmd: info: services_os_action_execute:
Managed rbd.in_meta-data_0 process 11117 exited with rc=0
Dec 18 17:22:39 [2695] node2 crmd: error: process_lrm_event:
Operation p_rbd_map_1_start_0: Timed Out (node=node2, call=48, timeout=20000ms)
Dec 18 17:22:39 [2695] node2 crmd: notice: process_lrm_event:
node2-p_rbd_map_1_start_0:48 [ libust[11019/11019]: Warning: HOME environment
variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps()
at lttng-ust-comm.c:305)\n ]
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request:
Forwarding cib_modify operation for section status to master
(origin=local/crmd/99)
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff:
--- 0.69.161 2
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff:
+++ 0.69.162 (null)
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + /cib:
@num_updates=162
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: +
/cib/status/node_state[@id='node2']: @crm-debug-origin=do_update_resource
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: +
/cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']/lrm_rsc_op[@id='p_rbd_map_1_last_0']:
@operation_key=p_rbd_map_1_start_0, @operation=start,
@transition-key=6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9,
@transition-magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, @call-id=48,
@rc-code=1, @op-status=2, @last-run=1450430539, @last-rc-change=1450430539,
@exec-time=20002
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: ++
/cib/status/node_state[@id='node2']/lrm[@id='node2']/lrm_resources/lrm_resource[@id='p_rbd_map_1']:
<lrm_rsc_op id="p_rbd_map_1_last_failure_0"
operation_key="p_rbd_map_1_start_0" operation="start"
crm-debug-origin="do_update_resource" crm_feature_set="3.0.9"
transition-key="6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9"
transition-magic="2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9" call-id="48"
rc-code="1" op-status="2" interval="0" l
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request:
Completed cib_modify operation for section status: OK (rc=0,
origin=node2/crmd/99, version=0.69.162)
Dec 18 17:22:39 [2695] node2 crmd: warning: status_from_rc: Action 6
(p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount:
Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1
(update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2 crmd: notice: abort_transition_graph:
Transition aborted by p_rbd_map_1_start_0 'modify' on node2: Event failed
(magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162,
source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2 crmd: info: match_graph_event:
Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_trigger_update:
Sending flush op to all hosts for: fail-count-p_rbd_map_1 (INFINITY)
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount:
Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1
(update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2 crmd: info: process_graph_event:
Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2695] node2 crmd: warning: status_from_rc: Action 6
(p_rbd_map_1_start_0) on node2 failed (target: 0 vs. rc: 1): Error
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount:
Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1
(update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2 crmd: info: abort_transition_graph:
Transition aborted by p_rbd_map_1_start_0 'create' on (null): Event failed
(magic=2:1;6:3:0:1b17b95d-a029-4ea5-be6d-4e5d8add6ca9, cib=0.69.162,
source=match_graph_event:344, 0)
Dec 18 17:22:39 [2695] node2 crmd: info: match_graph_event:
Action p_rbd_map_1_start_0 (6) confirmed on node2 (rc=4)
Dec 18 17:22:39 [2695] node2 crmd: warning: update_failcount:
Updating failcount for p_rbd_map_1 on node2 after failed start: rc=1
(update=INFINITY, time=1450430559)
Dec 18 17:22:39 [2695] node2 crmd: info: process_graph_event:
Detected action (3.6) p_rbd_map_1_start_0.48=unknown error: failed
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_perform_update:
Sent update 28: fail-count-p_rbd_map_1=INFINITY
Dec 18 17:22:39 [2690] node2 cib: info: cib_process_request:
Forwarding cib_modify operation for section status to master
(origin=local/attrd/28)
Dec 18 17:22:39 [2695] node2 crmd: notice: run_graph: Transition 3
(Complete=2, Pending=0, Fired=0, Skipped=8, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-234.bz2): Stopped
Dec 18 17:22:39 [2695] node2 crmd: info: do_state_transition:
State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
cause=C_FSA_INTERNAL origin=notify_crmd ]
Dec 18 17:22:39 [2693] node2 attrd: notice: attrd_trigger_update:
Sending flush op to all hosts for: last-failure-p_rbd_map_1 (1450430559)
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff:
--- 0.69.162 2
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: Diff:
+++ 0.69.163 (null)
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: + /cib:
@num_updates=163
Dec 18 17:22:39 [2690] node2 cib: info: cib_perform_op: ++
/cib/status/node_state[@id='node2']/transient_attributes[@id='node2']/instance_attributes[@id='status-node2']:
<nvpair id="status-node2-fail-count-p_rbd_map_1"
name="fail-count-p_rbd_map_1" value="INFINITY"/>
.........
thanks
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com