Re: [DRBD-user] DRBD9&Pacemaker

刘丹 Tue, 27 Sep 2016 07:07:55 -0700

Failed to send to one or more email server, so send again.





At 2016-09-27 15:47:37, "Nick Wang" <[email protected]> wrote:
>>>> On 2016-9-26 at 19:17, in message
><CACp6BS7W6PyW=453wkrrfgsz+f0mqhql2m9fjsxficocq+w...@mail.gmail.com>, Igor
>Cicimov <[email protected]> wrote:
>> On 26 Sep 2016 7:26 pm, "mzlld1988" <[email protected]> wrote:
>> >
>> > I apply the attached patch file to scripts／drbd.ocf，then pacemaker can
>> start drbd successfully，but only two nodes， the third node's drbd is
>> down，is it right?
>> Well you didnt say you have 3 nodes. Usually you use pacemaker with 2 nodes
>> and drbd.
>The patch suppose to help on 3(more) nodes scenario, as long as only one 
>Primary.
>Is the 3 nodes DRBD cluster working without pacemaker? And how did you 
>configure in pacemaker?
Accoring to  http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf, I 
executed the following commands to configure drbd in pacemaker.
 [root@pcmk-1 ~]# pcs cluster cib drbd_cfg
 [root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \
     drbd_resource=wwwdata op monitor interval=60s
 [root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
     master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
     notify=true
 [root@pcmk-1 ~]# pcs cluster cib-push drbd_cfg
>> > And another question is ，can pacemaker  successfully stop the slave node
>> ? My result is pacemaker can't sop the slave node.
>> >
>Yes, need to check the log on which resource prevent pacemaker to stop.


Pacemaker can't stop slave node's drbd, I think the reason may be the same as 
my previous email(see attached file) ,but no one reply that email.
[root@drbd ~]# pcs status
 Cluster name: mycluster
 Stack: corosync
 Current DC: drbd.node103 (version 1.1.15-e174ec8) - partition with quorum
 Last updated: Mon Sep 26 04:36:50 2016          Last change: Mon Sep 26 
04:36:49 2016 by root via cibadmin on drbd.node101

3 nodes and 2 resources configured

Online: [ drbd.node101 drbd.node102 drbd.node103 ]

Full list of resources:

 Master/Slave Set: WebDataClone [WebData]
   Masters: [ drbd.node102 ]
   Slaves: [ drbd.node101 ]

Daemon Status:
   corosync: active/corosync.service is not a native service, redirecting to 
/sbin/chkconfig.
 Executing /sbin/chkconfig corosync --level=5
 enabled
   pacemaker: active/pacemaker.service is not a native service, redirecting to 
/sbin/chkconfig.
 Executing /sbin/chkconfig pacemaker --level=5
 enabled
   pcsd: active/enabled

-------------------------------------------------------------
Faile to execute ‘pcs cluster stop drbd.node101’

=Error message on drbd.node101(secondary node)
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ 
Command 'drbdsetup down r0' terminated with exit code 11 ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ r0: 
State change failed: (-10) State change was refused by peer node ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ 
additional info from kernel: ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ failed 
to disconnect ]
  Sep 26 04:39:26 drbd lrmd[3521]:  notice: WebData_stop_0:4726:stderr [ 
Command 'drbdsetup down r0' terminated with exit code 11 ]
  Sep 26 04:39:26 drbd crmd[3524]:   error: Result of stop operation for 
WebData on drbd.node101: Timed Out | call=12 key=WebData_stop_0 timeout=100000ms
  Sep 26 04:39:26 drbd crmd[3524]:  notice: drbd.node101-WebData_stop_0:12 [ 
r0: State change failed: (-10) State change was refused by peer 
node\nadditional info from kernel:\nfailed to disconnect\nCommand 'drbdsetup 
down r0' terminated with exit code 11\nr0: State change failed: (-10) State 
change was refused by peer node\nadditional info from kernel:\nfailed to 
disconnect\nCommand 'drbdsetup down r0' terminated with exit code 11\nr0: State 
change failed: (-10) State change was refused by peer node\nadditional info 
from kernel:\nfailed t
 =Error message on drbd.node102(primary node)
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Preparing remote state 
change 3578772780 (primary_nodes=4, weak_nodes=FFFFFFFFFFFFFFFB)
  Sep 26 04:39:25 drbd kernel: drbd r0: State change failed: Refusing to be 
Primary while peer is not outdated
  Sep 26 04:39:25 drbd kernel: drbd r0: Failed: susp-io( no -> fencing)
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Failed: conn( Connected -> 
TearDown ) peer( Secondary -> Unknown )
  Sep 26 04:39:25 drbd kernel: drbd r0/0 drbd1 drbd.node101: Failed: pdsk( 
UpToDate -> DUnknown ) repl( Established -> Off )
  Sep 26 04:39:25 drbd kernel: drbd r0 drbd.node101: Aborting remote state 
change 3578772780
>> > I'm looking forward to your answers.Thanks.
>> >
>> Yes it works with 2  nodes drbd9 configured in standard way not via
>> drbdmanager. Haven't tried any other layout.
>> 
>> >
>
>Best regards,
>Nick

--- Begin Message ---

Hi everyone,
I have a question about removing the secondary node of DRBD9.
When fencing is set, is it normal that we can't remove the secondary node of 
DRBD9, but the operation is successful of DRBD8.4.6?

Version of DRBD kernel source is the newest version(9.0.4-1).Version of DRBD 
utils is 8.9.6.
Description:
    3 nodes, one of the nodes is primary,disk state is UpToDate.Fencing is set.
    I got an error message 'State change failed: (-7) State change was refused 
by peer node' when executing the command 'drbdadm down <res-name>' on any of 
the secondary nodes.

Analysis:
    When executing the down command on one of the secondary nodes.
    The secondary node will execute the methods 'change_cluster_wide_state' of 
drbd_state.c.
    change_cluster_wide_state()
    {
        ...
        if (have_peers) {
                if (wait_event_timeout(resource->state_wait,
                               cluster_wide_reply_ready(resource),
                               twopc_timeout(resource))){-------------①Waiting 
for peer node to reply, the thread will sleep until the peer node replies.
                    rv = get_cluster_wide_reply(resource);------------②Get the 
reply info.        
                }else{
                }
        ...
    }

    Process ①
        Primary node will execute the following methods.
            
..->try_state_change->is_valid_soft_transition->__is_valid_soft_transition

            Finally,__is_valid_soft_transition will return error code 
SS_PRIMARY_NOP。


            if (peer_device->connection->fencing_policy >= FP_RESOURCE &&
                !(role[OLD] == R_PRIMARY && repl_state[OLD] < L_ESTABLISHED && 
!(peer_disk_state[OLD] <= D_OUTDATED)) &&
                 (role[NEW] == R_PRIMARY && repl_state[NEW] < L_ESTABLISHED && 
!(peer_disk_state[NEW] <= D_OUTDATED)))

                   return SS_PRIMARY_NOP;


            Primary node will set drbd_packet to P_TWOPC_NO, seconday node will 
get the reply to set connection status to TWOPC_NO。
            At this time,Process ① will finish.


    Process ②
           rv will be set to SS_CW_FAILED_BY_PEER
        
    ====8.4.6版====
        One is primary, the next one is secondary.
        When executing 'drbdadm down <res-name>' on seconday node, the same 
error message will be recorded in the log file for the first time to change the 
peer disk to D_UNKNOWN。
        But the command will succeed by changing peer disk to D_OUTDATED for 
the second time.
        
        The following code that report the error.
        is_valid_state()
        {
            ...
            if (fp >= FP_RESOURCE &&
                     ns.role == R_PRIMARY && ns.conn < C_CONNECTED && ns.pdsk 
>= D_UNKNOWN①){
                        rv = SS_PRIMARY_NOP;
                     }
            ...
        }
       
        After executing the command 'drbdadm down <res-name>' on secondary 
node, the status of the primary node is:
        [root@drbd846 drbd-8.4.6]# cat /proc/drbd
        version: 8.4.6 (api:1/proto:86-101)
        GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by 
[email protected], 2016-09-08 08:51:45
         0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/Outdated   r-----
            ns:1048508 nr:0 dw:0 dr:1049236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 
wo:f oos:0

        The peer disk state is OutDated, not DUnknown.

--- End Message ---

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD9&Pacemaker

Reply via email to