Re: [DRBD-user] Primary not disconnecting Secondary with IO, problems (Was: Re: drbd-user Digest, Vol 65, Issue 4)

Jeff Orr Mon, 07 Dec 2009 01:13:51 -0800

Whoops, guess I fixated on one aspect of the problem. Specifically, the
"I/O errors on the secondary stop I/O on the primary". I was thinking
that NFS problems were affecting one or both hosts. I don't think the
master will ever deliberately disconnect from the secondary, unless a
split-brain occurs.


Can you go into more detail about what you mean by "stops I/O on the
primary"? Do I/O requests to the DRBD volume start failing? Does the
primary freeze/act odd in any other way?

James Masson wrote:
> Hi Jeff,
>
> thanks for the response.
>
> I'm using "rsize=32768,wsize=32768,nointr,timeo=300,noatime"
>
> But I don't see what that has to do with a DRBD Primary not detecting that 
> it's Secondary is broken,
> and disconnecting it. Or the Secondary itself not realising it's broken, when 
> it should have
> disconnected by itself.
>
> I can reproduce the issue without NFS, just using local filesystem 
> interaction on the Primary.
>
> Am I missing something about how write timeouts work on DRBD?
>
> James
>
> jeff wrote:
>   
>> Are you mounting the NFS volumes with a timeout? I seem to recall that
>> an NFS timeout can really screw with a system, whether it's primary or
>> secondary. I usually mount my NFS with the soft,timeo=30 options.
>>
>> Hope that helps.
>>     
>>> Message: 2
>>> Date: Wed, 02 Dec 2009 09:21:45 +0000
>>> From: James Masson <[email protected]>
>>> Subject: Re: [DRBD-user] Primary not disconnecting Secondary with IO
>>>     problems
>>> To: [email protected]
>>> Message-ID: <[email protected]>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>>
>>> has anybody seen this before, got any insight?
>>>
>>> James
>>>
>>> James Masson wrote:
>>>   
>>>       
>>>> Hi list,
>>>>
>>>> I'm using DRBD and NFS to provide HA to Virtual Machine images between 
>>>> pairs of storage servers.
>>>>
>>>> Systems are RHEL5.4 2.6.18-164.el5 + drbd8.3 from Centos Extras
>>>>
>>>> We've been having issues where disk I/O problems on the DRBD Secondary 
>>>> stops all IO to the Primary
>>>> too. DRBD doesn't seem to recognise these disk I/O problems, the Secondary 
>>>> isn't disconnected
>>>> automatically. Everything just hangs.
>>>>
>>>> During this state:
>>>> If I try a "drbdadm disconnect all" on the Primary, the command hangs.
>>>> If I try this on the Secondary, the command eventually completes, and NFS 
>>>> I/O returns to normal
>>>> operation on the Primary.
>>>>
>>>> I've tried the following things to fix this:
>>>>
>>>> 1) Putting in a custom local-io-error handler to hard reset the problem 
>>>> node.
>>>>
>>>> This never triggers. Just like the default "detach", never triggers.
>>>>
>>>> 2) Changing the net connection parameters to:
>>>>
>>>>    net {
>>>>            ko-count 2;
>>>>            timeout 20;
>>>>    }
>>>>
>>>> Again, this never triggers.
>>>>
>>>>
>>>> 3) Changing the protocol used from C to B
>>>>
>>>> Doesn't have any effect on the issue - I'd prefer to use C anyway.
>>>>
>>>>
>>>> Any further ideas on how to track this issue down and fix it?
>>>>
>>>> thanks
>>>>
>>>> James Masson
>>>> _______________________________________________
>>>> drbd-user mailing list
>>>> [email protected]
>>>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>>>     
>>>>         
>> _______________________________________________
>> drbd-user mailing list
>> [email protected]
>> http://lists.linbit.com/mailman/listinfo/drbd-user
>>     

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Primary not disconnecting Secondary with IO, problems (Was: Re: drbd-user Digest, Vol 65, Issue 4)

Reply via email to