Failed to send to one or more email server, so send again.
At 2016-09-27 15:47:37, "Nick Wang" <[email protected]> wrote:
>>>> On 2016-9-26 at 19:17, in message
><CACp6BS7W6PyW=453wkrrfgsz+f0mqhql2m9fjsxficocq+w...@mail.gmail.com>, Igor
>Cicimov <[email protected]> wrote:
>> On 26 Sep 2016 7:26 pm, "mzlld1988" <[email protected]> wrote:
>> >
>> > I apply the attached patch file to scripts/drbd.ocf,then pacemaker can
>> start drbd successfully,but only two nodes, the third node's drbd is
>> down,is it right?
>> Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes
>> and drbd.
>The patch suppose to help on 3(more) nodes scenario, as long as only one
>Primary.
>Is the 3 nodes DRBD cluster working without pacemaker? And how did you
>configure in pacemaker?
Accoring to http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I
executed the following commands to configure drbd in pacemaker.
[root@pcmk-1 ~]# pcs cluster cib drbd_cfg
[root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \
drbd_resource=wwwdata op monitor interval=60s
[root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
notify=true
[root@pcmk-1 ~]# pcs cluster cib-push drbd_cfg
>> > And another question is ,can pacemaker successfully stop the slave node
>> ? My result is pacemaker can't sop the slave node.
>> >
>Yes, need to check the log on which resource prevent pacemaker to stop.
Pacemaker can't stop slave node's drbd, I think the reason may be the same as
my previous email(see attached file) ,but no one reply that email.
[root@drbd ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum
Last updated: Mon Sep 26 04:36:50 2016 Last change: Mon Sep 26
04:36:49 2016 by root via cibadmin on drbd.node101
3 nodes and 2 resources configured
Online: [ drbd.node101 drbd.node102 drbd.node103 ]
Full list of resources:
Master/Slave Set: WebDataClone [WebData]
Masters: [ drbd.node102 ]
Slaves: [ drbd.node101 ]
Daemon Status:
corosync: active/corosync.service is not a native service, redirecting to
/sbin/chkconfig.
Executing /sbin/chkconfig corosync --level=5
enabled
pacemaker: active/pacemaker.service is not a native service, redirecting to
/sbin/chkconfig.
Executing /sbin/chkconfig pacemaker --level=5
enabled
pcsd: active/enabled
-------------------------------------------------------------
Faile to execute ‘pcs cluster stop drbd.node101’
=Error message on drbd.node101(secondary node)
Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ r0:
State change failed: (-10) State change was refused by peer node ]
Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
additional info from kernel: ]
Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [ failed
to disconnect ]
Sep 26 04:39:26 drbd lrmd[3521]: notice: WebData_stop_0:4726:stderr [
Command 'drbdsetup down r0' terminated with exit code 11 ]
Sep 26 04:39:26 drbd crmd[3524]: error: Result of stop operation for
WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms
Sep 26 04:39:26 drbd crmd[3524]: notice: drbd.node101-WebData_stop_0:12 [
r0: State change failed: (-10) State change was refused by peer
node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup
down r0' terminated with exit code 11\nr0: State change failed: (-10) State
change was refused by peer node\nadditional info from kernel:\nfailed to
disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State
change failed: (-10) State change was refused by peer node\nadditional info
from kernel:\nfailed t
=Error message on drbd.node102(primary node)
Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state
change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)
Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be
Primary while peer is not outdated
Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)
Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected ->
TearDown ) peer( Secondary -> Unknown )
Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk(
UpToDate -> DUnknown ) repl( Established -> Off )
Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state
change 3578772780
>> > I'm looking forward to your answers.Thanks.
>> >
>> Yes it works with 2 nodes drbd9 configured in standard way not via
>> drbdmanager. Haven't tried any other layout.
>>
>> >
>
>Best regards,
>Nick
--- Begin Message ---
Hi everyone,
I have a question about removing the secondary node of DRBD9.
When fencing is set, is it normal that we can't remove the secondary node of
DRBD9, but the operation is successful of DRBD8.4.6?
Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD
utils is 8.9.6.
Description:
3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set.
I got an error message 'State change failed: (-7) State change was refused
by peer node' when executing the command 'drbdadm down <res-name>' on any of
the secondary nodes.
Analysis:
When executing the down command on one of the secondary nodes.
The secondary node will execute the methods 'change_cluster_wide_state' of
drbd_state.c.
change_cluster_wide_state()
{
...
if (have_peers) {
if (wait_event_timeout(resource->state_wait,
cluster_wide_reply_ready(resource),
twopc_timeout(resource))){-------------①Waiting
for peer node to reply, the thread will sleep until the peer node replies.
rv = get_cluster_wide_reply(resource);------------②Get the
reply info.
}else{
}
...
}
Process ①
Primary node will execute the following methods.
..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition
Finally,__is_valid_soft_transition will return error code
SS_PRIMARY_NOP。
if (peer_device->connection->fencing_policy >= FP_RESOURCE &&
!(role[OLD] == R_PRIMARY && repl_state[OLD] < L_ESTABLISHED &&
!(peer_disk_state[OLD] <= D_OUTDATED)) &&
(role[NEW] == R_PRIMARY && repl_state[NEW] < L_ESTABLISHED &&
!(peer_disk_state[NEW] <= D_OUTDATED)))
return SS_PRIMARY_NOP;
Primary node will set drbd_packet to P_TWOPC_NO, seconday node will
get the reply to set connection status to TWOPC_NO。
At this time,Process ① will finish.
Process ②
rv will be set to SS_CW_FAILED_BY_PEER
====8.4.6版====
One is primary, the next one is secondary.
When executing 'drbdadm down <res-name>' on seconday node, the same
error message will be recorded in the log file for the first time to change the
peer disk to D_UNKNOWN。
But the command will succeed by changing peer disk to D_OUTDATED for
the second time.
The following code that report the error.
is_valid_state()
{
...
if (fp >= FP_RESOURCE &&
ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk
>= D_UNKNOWN①){
rv = SS_PRIMARY_NOP;
}
...
}
After executing the command 'drbdadm down <res-name>' on secondary
node, the status of the primary node is:
[root@drbd846 drbd-8.4.6]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by
[email protected], 2016-09-08 08:51:45
0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated r-----
ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
wo:f oos:0
The peer disk state is OutDated, not DUnknown.
--- End Message ---
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user