nickva opened a new pull request #3734:
URL: https://github.com/apache/couchdb/pull/3734


   Previously, users with low {Q, N} dbs often got the `"No DB shards could be 
opened."` error when the cluster is overloaded. The hard-coded 100 msec timeout 
was too low to open the few available shards and the whole request would crash 
with a 500 error.
   
   Attempt to calculate an optimal timeout value based on the number of shards 
and the max fabric request timeout limit.
   
   The sequence of doubling (by default) timeouts forms a geometric 
progression. Use the well known closed form formula for the sum [0], and the 
maximum request timeout, to calculate the initial timeout. The test case 
illustrates a few examples with some default Q and N values.
   
   Because we don't want the timeout value to be too low, since it takes time 
to open shards, and we don't want to quickly cycle through a few initial shards 
and discard the results, the minimum initial timeout is clipped to the
   previously hard-coded 100 msec timeout. Unlike previously however, this 
minimum value can now also be configured.
   
   Another issue with the previous code was that it was emitting a generic 
error without a specific reason why the shards could not be opened. Timeout was 
the most likely reason, but to confirm user either had to enable debug logging, 
or apply clever erlang tracing on the `couch_log:debug/2` call. So as an 
improvement, emit the reason string into the get_shard/5 recursive call so it 
can be bubbled up with the error tuple.
   
   [0] https://en.wikipedia.org/wiki/Geometric_series
   
   Fixes: https://github.com/apache/couchdb/issues/3733
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to