nickva commented on a change in pull request #409: RFC for CouchDB background workers URL: https://github.com/apache/couchdb-documentation/pull/409#discussion_r282186289
########## File path: rfcs/007-background-workers.md ########## @@ -0,0 +1,299 @@ +--- +name: Formal RFC +about: Submit a formal Request For Comments for consideration by the team. +title: 'Background workers with FoundationDB backend' +labels: rfc, discussion +assignees: '' + +--- + +[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ ) + +# Introduction + +This document describes a data model and behavior of CouchDB background workers. + +## Abstract + +CouchDB background workers are used for things like index building and +replication. We present a generalized model that allows creation, running, and +monitoring of these jobs. "Jobs" are represented generically such that both +replication and indexing could take advantage of the same framework. The basic +idea is that of a global job queue for each job type. New jobs are inserted +into the jobs table and enqueued for execution. + +There are a number of workers that attempt to dequeue pending jobs and run +them. "Running" is specific to each job type and would be different for +replication and indexing, respectively. + +Workers are processes which execute jobs. They MAY be individual Erlang +processes, but could also be implemented in Python, Java or any other +environment with a FoundationDB client. The only coordination between workers +happens via the database. Workers can start and stop at any time. Workers +monitor each other for liveliness and in case some workers abruptly terminate, +all the jobs of a dead worker are re-enqueued into the global pending queue. Review comment: It's job type specific. Whether to re-enqueue the job to retry on error (with a backoff penalty) would be dependent on the type. This framework would only know if the job is running somewhere, is in a pending queue waiting to run, or has completed and will not be running anymore. The reason for the job being in pending state could be a retry or could be because it is a new job. The reason for completion could be a successful completion or failure. I am ok with adding those extra states if it would generally work for indexing. It would work for replication (for reference https://docs.couchdb.org/en/stable/replication/replicator.html#replication-states) So it could be: * "pending" : waiting to run in a queue * "running" : running (in a worker) * "completed" : successfully completed * "failed": permanent failure, job should not be re-inserted in the pending queue, user should delete it and re-create it * "error" (this also maps to "crashing" in the replicator case): An error has occurred but the job is pending for a retry. Should job retries and backoff schedule also be part of the jobs framework or each worker type should define its own policy. (ex. replication jobs use a doubling on each failed job start, with a max about 8 hours, what do indexing jobs do?) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
