nickva opened a new pull request #3524: URL: https://github.com/apache/couchdb/pull/3524
Add AIMD-based batching to couch_jobs activity monitor and notifier couch_jobs activity monitor is responsible for checking jobs which have not been updated often enough by their workers and re-enqueuing them. Previously when the number of jobs grew high enough, couch_jobs could fail to either iterate through all the jobs and timeout with a 1007, try to re-enqueue too many jobs such that the sum of the commit data would end up being larger than the 10MB FDB limit. couch_jobs notifier is in charge notifying subscribers when job state changes. If the jobs are updated, it would notify if it noticed updates, otherwise it would notify if job switch to a new state, update -> pending, update -> finished, etc. Previously if there were too many jobs and/or the cluster was overloaded it was possible for the notifier to fail with timeouts. To fix both of the issue, introduce batching with the batch size dynamically adjusted based on load. As more consecutive errors occur the batch will shrink exponentially down to 1 row per transaction. Then with each success, the batch will grow linearly by a fixed amount. This auto-configurable behavior should provide optimal behavior during overload and during normal operating conditions. For tests, since there are already tests which test enqueuing and subscription, use the same tests but make sure they are run while errors are periodically generated. That's accomplished with the help of `meck:loop/1` return specification. There are two more commits in the PR to fix and update the next_vs function and another one to simplify handling retryable errors in couch_jobs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org