Hi,

I recently had the following occur on the primary node of a DRBD resource, 
running DRBD 8.4.5 on CentOS 6.6 (kernel 2.6.32-504.el6.x86_64):

Nov 11 05:34:54 kernel: block drbd5: Remote failed to finish a request within 
ko-count * timeout
Nov 11 05:34:54 kernel: block drbd5: peer( Secondary -> Unknown ) conn( 
Connected -> Timeout ) pdsk( UpToDate -> DUnknown )

Being unfamiliar with ko-count, I looked at the documentation and found:

    ko-count number
        In case the secondary node fails to complete a single write request for 
count times the timeout, it is expelled from the cluster. (I.e. the primary 
node goes into StandAlone mode.) The default value is 0, which disables this 
feature.

The thing is -- nowhere in my config was ko-count set.  So seeing it apparently 
kick in was an unwelcome surprise.  I have since set ko-count and timeout to 
"large" values in the hope that it doesn't happen again.

Is this a DRBD bug, or expected behavior?  If it's somehow the latter, I think 
the combination of the documentation and error messages is quite misleading and 
should be fixed.


Thanks,
Zev Weiss

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to