Hi,
I recently had the following occur on the primary node of a DRBD resource,
running DRBD 8.4.5 on CentOS 6.6 (kernel 2.6.32-504.el6.x86_64):
Nov 11 05:34:54 kernel: block drbd5: Remote failed to finish a request within
ko-count * timeout
Nov 11 05:34:54 kernel: block drbd5: peer( Secondary -> Unknown ) conn(
Connected -> Timeout ) pdsk( UpToDate -> DUnknown )
Being unfamiliar with ko-count, I looked at the documentation and found:
ko-count number
In case the secondary node fails to complete a single write request for
count times the timeout, it is expelled from the cluster. (I.e. the primary
node goes into StandAlone mode.) The default value is 0, which disables this
feature.
The thing is -- nowhere in my config was ko-count set. So seeing it apparently
kick in was an unwelcome surprise. I have since set ko-count and timeout to
"large" values in the hope that it doesn't happen again.
Is this a DRBD bug, or expected behavior? If it's somehow the latter, I think
the combination of the documentation and error messages is quite misleading and
should be fixed.
Thanks,
Zev Weiss
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user