Re: [Linux-HA] DRBD problems are not reported

Vadym Chepkov Thu, 31 Mar 2011 17:38:15 -0700

On Mar 31, 2011, at 2:30 PM, Christoph Bartoschek wrote:

> Am 29.03.2011 15:31, schrieb Dejan Muhamedagic:
>> On Tue, Mar 29, 2011 at 08:13:49AM +0200, Christoph Bartoschek wrote:
>>> Am 29.03.2011 02:35, schrieb Vadym Chepkov:
>>>> 
>>>> On Mar 28, 2011, at 10:55 AM, Christoph Bartoschek wrote:
>>>> 
>>>>> Am 28.03.2011 16:30, schrieb Dejan Muhamedagic:
>>>>>> Hi,
>>>>>> 
>>>>>> On Mon, Mar 21, 2011 at 11:33:49PM +0100, Christoph Bartoschek wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am testing a NFS failover setup. During the tests I created a
>>>>>>> split-brain situation and now node A thinks it is primary and uptodate
>>>>>>> while node B thinks that it is Outdated.
>>>>>>> 
>>>>>>> crm_mon however does not indicate any error to me. Why is this the case?
>>>>>>> I expect to see anything that shows me the degraded status. How can this
>>>>>>> be fixed?
>>>>>> 
>>>>>> The cluster relies on the RA (in this case drbd) to report any
>>>>>> problems. Do you have a monitor operation defined for that
>>>>>> resource?
>>>>> 
>>>>> I have the resource defined as:
>>>>> 
>>>>> primitive p_drbd ocf:linbit:drbd \
>>>>>     params drbd_resource="home-data"
>>>>>     op monitor interval="15" role="Master" \
>>>>>     op monitor interval="30" role="Slave"
>>>>> 
>>>>> Is this a correct monitor operation?
>> 
>> Yes, though you should also add timeout specs.
>> 
>>>> Just out of curiosity, you do have ms resource defined?
>>>> 
>>>> ms ms_p_drbd p_drbd \
>>>>          meta master-max="1" master-node-max="1" clone-max="2" 
>>>> clone-node-max="1" notify="true"
>>>> 
>>>> Because if you do and cluster is not aware of the split-brain, drbd RA has 
>>>> a serious flaw.
>>>> 
>>> 
>>> I'm sorry. Yes, the ms resource is also defined.
>> 
>> Well, I'm really confused. You basically say that the drbd disk
>> gets into a degraded mode (i.e. it detects split brain), but the
>> cluster (pacemaker) never learns about that. Perhaps you should
>> open a bugzilla for this and supply hb_report. Though it's
>> really hard to believe. It's like basic functionality failing.
> 
> 
> What would you expect to see?
> 
> Currently I see the following in crm_mon:
> 
> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
>     Masters: [ ries ]
>     Slaves: [ laplace ]
> 
> 
> At the same time "cat /proc/drbd" on ries says:
> 
> ries:~ # cat /proc/drbd
> version: 8.3.9 (api:88/proto:86-95)
> srcversion: A67EB2D25C5AFBFF3D8B788
>  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
>     ns:0 nr:0 dw:4 dr:1761 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4
> 
> 
> And on node laplace it says:
> 
> laplace:~ # cat /proc/drbd
> version: 8.3.9 (api:88/proto:86-95)
> srcversion: A67EB2D25C5AFBFF3D8B788
>  0: cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown   r-----
>     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:4
> 
>




yes, and according to the RA script everything is perfect:

drbd_status() {
        local rc
        rc=$OCF_NOT_RUNNING

        if ! is_drbd_enabled || ! [ -b "$DRBD_DEVICE" ]; then
                return $rc
        fi

        # ok, module is loaded, block device node exists.
        # lets see its status
        drbd_set_status_variables
        case "${DRBD_ROLE_LOCAL}" in
        Primary)
                rc=$OCF_RUNNING_MASTER
                ;;
        Secondary)
                rc=$OCF_SUCCESS
                ;;
        Unconfigured)
                rc=$OCF_NOT_RUNNING
                ;;
        *)
                ocf_log err "Unexpected role ${DRBD_ROLE_LOCAL}"
                rc=$OCF_ERR_GENERIC
        esac

        return $rc
}

Staggering.

drbd_set_status_variable subroutine does set DRBD_CSTATE

I think the RA needs to be modified to something like this:

Secondary)
        if [[ $DRBD_CSTATE == Connected ]]; then
                rc=$OCF_SUCCESS
        else
                rc=$OCF_NOT_RUNNING
        fi


Vadym

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] DRBD problems are not reported

Reply via email to