Thanks a lot for your suggestions (Robert and Lars), it took a while before I was able to try them on virtual machines. I hope you don't mind that I reply to both of you in one mail -- I messed up my mail delivery options (now corrected).

I've added my latest drbd config below, for reference.

I can't find any sequence of commands that can convince drbd (or pacemaker) 
that I *want* to use outdated data.
This should work:
drbdadm del-peer tapas:fims1
drbdadm primary —force tapas

This seems to work (get to UpToDate state) briefly, until the next time the pacemaker drbd monitor runs, which 'demotes' the resource again to it's original state.

Failed Resource Actions:
* drbd_monitor_20000 on vmnbiaas2 'master' (8): call=84, status=complete, exitreason='',
    last-rc-change='Wed Oct  9 14:49:06 2019', queued=0ms, exec=0ms

The corosync logs are difficult to follow, so I'm not sure how I can get pacemaker to accept the trickery done behind its back..

Lars wrote:

Alternatively, you could *add* a suitable fencing constraint to your sole 
survivor node, which should make the fencing succeed.

You could tell the crm-fence-peer.9.sh fencing handler that an 
--unreachable-peer-is-outdated.
(Manually. From a root shell. That switch is not effective from within the drbd 
configuration; for reasons).

I tried this, after finding what the command should look like in /var/log/messages:

DRBD_BACKING_DEV_0=/dev/mapper/centos-drbd DRBD_CONF=/etc/drbd.conf DRBD_LL_DISK=/dev/mapper/centos-drbd DRBD_MINOR=0 DRBD_MINOR_0=0 DRBD_MY_ADDRESS=172.17.5.62 DRBD_MY_AF=ipv4 DRBD_MY_NODE_ID=1 DRBD_NODE_ID_0=vmnbiaas1 DRBD_NODE_ID_1=vmnbiaas2 DRBD_PEER_ADDRESS=172.17.5.61 DRBD_PEER_AF=ipv4 DRBD_PEER_NODE_ID=0 DRBD_RESOURCE=tapas DRBD_VOLUME=0 UP_TO_DATE_NODES=0x00000002 /usr/lib/drbd/crm-fence-peer.9.sh --unreachable-peer-is-outdated

This failed as follows:

Oct  9 14:29:42 vmnbiaas2 crm-fence-peer.9.sh[6153]: WARNING Found <cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="48" num_updates="23" admin_epoch="0" cib-last-written="Wed Oct  9 14:07:22 2019" update-origin="vmnbiaas1" update-client="cibadmin" update-user="root" have-quorum="0" dc-uuid="1"

Oct  9 14:29:42 vmnbiaas2 crm-fence-peer.9.sh[6153]: WARNING I don't have quorum; did not place the constraint!

OK, while I'm experimenting, I quick-hacked the script to use

fail_if_no_quorum=false

After which the error changes to

Oct  9 14:38:13 vmnbiaas2 crm-fence-peer.9.sh[7579]: WARNING some peer is UNCLEAN, my disk is not UpToDate, did not place the constraint!

Cheers!

     Rob


resource tapas {
  protocol C;

  startup {
    wfc-timeout            0;    ## Infinite!
    outdated-wfc-timeout    120;
    degr-wfc-timeout        120;  ## 2 minutes.
  }

  disk {
    on-io-error detach;
  }

  handlers {
    split-brain "/opt/sol/tapas/bin/split-brain-helper.sh";

    fence-peer "/usr/lib/drbd/crm-fence-peer.9.sh";
    unfence-peer "/usr/lib/drbd/crm-unfence-peer.9.sh";
  }

  net {
    fencing resource-only;

#    after-sb-0pri       discard-least-changes;
  }

  device        /dev/drbd0;
  disk            /dev/mapper/centos-drbd;
  meta-disk        internal;

  on vmnbiaas1 {
    address    172.17.5.61:7789;
  }

  on vmnbiaas2 {
    address    172.17.5.62:7789;
  }
}

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
[email protected]
https://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to