On Sat, Aug 22, 2015 at 12:00:01PM +0200, [email protected] 
wrote:
> Date: Sat, 22 Aug 2015 11:29:37 +0200
> From: Lars Ellenberg <[email protected]>
> Subject: Re: [DRBD-user] drbd.ocf misinterpreting role status with
>       multiple volumes
> 
> ...
> Yes, that is the case, and the above is intentional.
> Though I don't remember all of the reasoning right now.  Probably has
> something to do with adding a volume to an existing resource.
> 
> If you want independent volumes, use independend resources.
nope, we like the writer order guarantee :-)

> Pacemaker would not have any way to deal with a "partially promoted" DRBD.
too bad. If the RA knew the supposed role, it could try to achieve that
during a monitor. Though a monitor that actively changes stuff might be
a bit surprising.

> Anyone cares enough to prepare a patch?
I attached a patch (for 8.4.4, which is the version we are using). It
returns OCF_FAILED_MASTER in a monitor operation, but leaves the rc of
the internal drbd_status untouched. Otherwise all drbd_status()
consumers would need to handle that extra case.

Note that it also returns OCF_FAILED_MASTER if the resource is supposed
to be in Secondary role and one subvolume is Primary (showing as "Slave"
in crm_mon).


Regards
Matthias
-- 
one4vision GmbH                    Fon +49 681 96727 - 60
Residenz am Schlossgarten          Fax +49 681 96727 - 69
Talstraße 34-42                    [email protected]
D-66119 Saarbrücken                http://www.one4vision.de
HRB 11751                          verantwortl. Geschäftsführer:
Amtsgericht Saarbrücken            Christof Allmann, Christoph Harth
--- drbd.orig   2014-02-14 03:47:56.000000000 +0100
+++ drbd        2015-08-24 22:45:10.730542848 +0200
@@ -309,6 +309,37 @@
 
        # Populates a set of variables relevant to DRBD's status
        eval "$($DRBDSETUP "$DRBD_RESOURCE" sh-status)"
+
+
+       # to detect inconsistent roles in multi-volume setups,
+       # find the minimum and the maximum of the local roles
+       #   Primary > Secondary > Unconfigured > Unknown > *
+       # used in drbd_promote() and drbd_status()
+       DRBD_MAX_ROLE_LOCAL="Unknown"
+       DRBD_MIN_ROLE_LOCAL="Unknown"
+
+       local rolelist role
+       rolelist=${DRBD_ROLE_LOCAL[*]}
+       for role in Primary Secondary Unconfigured; do
+               if [ "${rolelist[*]//${role}}" != "${rolelist[*]}" ]; then
+                       DRBD_MIN_ROLE_LOCAL=$role
+               fi
+               rolelist=(${rolelist[*]//${role}})
+       done
+       if [ -n "${rolelist[0]}" ]; then
+               DRBD_MIN_ROLE_LOCAL=${rolelist[0]}
+       fi
+
+
+       DRBD_MAX_ROLE_LOCAL="Unknown"
+       rolelist=${DRBD_ROLE_LOCAL[*]}
+       for role in Unconfigured Secondary Primary; do
+               if [ "${rolelist[*]//${role}}" != "${rolelist[*]}" ]; then
+                       DRBD_MAX_ROLE_LOCAL=$role
+               fi
+               rolelist=(${rolelist[*]//${role}})
+       done
+
 }
 
 # This is not the only fencing mechanism.
@@ -533,6 +564,14 @@
                rc=$OCF_ERR_GENERIC
        esac
 
+       # multivolume resources with inconsistent roles are logged as an error
+       # but turning OCF_RUNNING_MASTER into OCF_FAILED_MASTER is done only
+       # in drbd_monitor() to spare drbd_start()/drbd_promote()/drbd_demote()
+       # handling an additional case
+       if [ "${DRBD_MIN_ROLE_LOCAL}" != "${DRBD_MAX_ROLE_LOCAL}" ]; then
+               ocf_log err "multivolume ${DRBD_RESOURCE} with inconsistent 
roles: >>${DRBD_ROLE_LOCAL[*]}<<"
+       fi
+
        return $rc
 }
 
@@ -561,6 +600,13 @@
                drbd_update_master_score
        fi
 
+        # for multivolume resources, drbd_status() returns OCF_RUNNING_MASTER
+        # if any subvolume is found in Primary role
+        # change that to OCF_FAILED_MASTER if any subvolume is not Primary
+        if [ "${status}" -eq "${OCF_RUNNING_MASTER}" -a 
"${DRBD_MIN_ROLE_LOCAL}" != "Primary" ]; then
+            status=$OCF_FAILED_MASTER
+        fi
+
        case $status in
        (0) : "OCF_SUCCESS" ;;
        (1) : "OCF_ERR_GENERIC" ;;
@@ -714,8 +760,24 @@
                        break
                        ;;
                $OCF_RUNNING_MASTER)
-                       rc=$OCF_SUCCESS
-                       break
+                       if [ "${DRBD_MIN_ROLE_LOCAL}" = "Primary" ]; then
+                               # only if all subvolumes are in "Primary" state
+                               rc=$OCF_SUCCESS
+                               break
+                       else
+                               # not all subvolumes are "Primary" (not just 
yet?)
+                               # keep re-trying (worst case: until timeout)
+                               do_drbdadm primary $DRBD_RESOURCE
+                               if [[ $? = 17 ]]; then
+                                       # All available disks are inconsistent,
+                                       # or I am consistent, but failed to 
fence the peer.
+                                       # Cannot become primary.
+                                       # No need to retry indefinitely.
+                                       ocf_log crit "Refusing to be promoted 
to Primary without UpToDate data"
+                                       break
+                               fi
+                       fi
+                       ;;
                esac
                $first_try || sleep 1
                first_try=false
@@ -749,7 +811,7 @@
                        break
                        ;;
                $OCF_NOT_RUNNING)
-                       ocf_log error "Trying to promote a resource that was 
not started"
+                       ocf_log error "Trying to demote a resource that was not 
started"
                        break
                        ;;
                $OCF_RUNNING_MASTER)
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to