nickva opened a new pull request #3734:
URL: https://github.com/apache/couchdb/pull/3734
Previously, users with low {Q, N} dbs often got the `"No DB shards could be
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout
was too low to open the few available shards and the whole request would crash
with a 500 error.
Attempt to calculate an optimal timeout value based on the number of shards
and the max fabric request timeout limit.
The sequence of doubling (by default) timeouts forms a geometric
progression. Use the well known closed form formula for the sum [0], and the
maximum request timeout, to calculate the initial timeout. The test case
illustrates a few examples with some default Q and N values.
Because we don't want the timeout value to be too low, since it takes time
to open shards, and we don't want to quickly cycle through a few initial shards
and discard the results, the minimum initial timeout is clipped to the
previously hard-coded 100 msec timeout. Unlike previously however, this
minimum value can now also be configured.
Another issue with the previous code was that it was emitting a generic
error without a specific reason why the shards could not be opened. Timeout was
the most likely reason, but to confirm user either had to enable debug logging,
or apply clever erlang tracing on the `couch_log:debug/2` call. So as an
improvement, emit the reason string into the get_shard/5 recursive call so it
can be bubbled up with the error tuple.
[0] https://en.wikipedia.org/wiki/Geometric_series
Fixes: https://github.com/apache/couchdb/issues/3733
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]