nickva commented on a change in pull request #409: RFC for CouchDB background 
workers
URL: 
https://github.com/apache/couchdb-documentation/pull/409#discussion_r282932500
 
 

 ##########
 File path: rfcs/007-background-workers.md
 ##########
 @@ -0,0 +1,299 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Background workers with FoundationDB backend'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model and behavior of CouchDB background 
workers.
+
+## Abstract
+
+CouchDB background workers are used for things like index building and
+replication. We present a generalized model that allows creation, running, and
+monitoring of these jobs. "Jobs" are represented generically such that both
+replication and indexing could take advantage of the same framework. The basic
+idea is that of a global job queue for each job type. New jobs are inserted
+into the jobs table and enqueued for execution.
+
+There are a number of workers that attempt to dequeue pending jobs and run
+them. "Running" is specific to each job type and would be different for
+replication and indexing, respectively.
+
+Workers are processes which execute jobs. They MAY be individual Erlang
+processes, but could also be implemented in Python, Java or any other
+environment with a FoundationDB client. The only coordination between workers
+happens via the database. Workers can start and stop at any time. Workers
+monitor each other for liveness and in case some workers abruptly terminate,
+all the jobs of a dead worker are re-enqueued into the global pending queue.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+---
+
+`Job`: An unit of work, identified by a `JobId` and also having a `JobType`.
+
+`Job table`: A subspace holding the list of jobs indexed by `JobId`.
+
+`Pending queue`: A queue of jobs which are ready to run. Workers may pick jobs
+from this queue and start running them.
+
+`Active jobs`: A subspace holding the list of jobs currently run by a
+particular worker.
+
+`Worker`: A job execution unit. Workers could be individual processes or groups
+of processes running on remote nodes.
+
+`Health key` : A key that the worker periodically updates with a timestamp to
+indicate that they are "alive" and ready to process jobs.
+
+`Versionstamp`: a 12 byte, unique, monotonically (but not sequentially)
+increasing value for each committed transaction.
+
+---
+
+# Detailed Description
+
+## Data Model
+
+The main job table:
+ - `("couch_workers", "jobs", JobType, JobId) = (JobState, WorkerId, Priority, 
CancelReq, HealthTimeout, JobInfo, JobOps)`
+
+Pending queue:
+ - `("couch_workers", "pending", JobType, Priority, JobId) = ""`
+
+Active jobs:
+ - `("couch_workers", "active", WorkerType, Worker, JobId) = (JobState, 
JobInfo, JobOpts)`
+
+Worker registration and health:
+ - `("couch_workers", WorkerType, "workers_vs") = Versionstamp`
+ - `("couch_workers", WorkerType, "workers", WorkerId) = WorkerOpts`
+ - `("couch_workers", WorkerType, "health", WorkerId) = HealthCounter`
+
+### Data model fields
+
+- `JobType`: The job type. It MAY be ``replication`` or ``indexing``, however,
+  specific types are not in scope of this document.
+
+- `JobId` : MUST uniquely identify a job in a cluster. It MAY be a `UUID` or
+  derived from some other job parameters. In case the same job is submitted
+  multiple times, it is up to the job creation and type-specific worker logic
+  to cancel, re-start or update jobs.
+
+- `JobState` : MUST be one of `pending`, `running`, or `completed`
+
+- `Priority` : MAY be any key which allows sorting jobs, such as a timestamp,
+  or a tag.
+
+- `WorkerId` : MUST uniquely identify a worker in a cluster. It MAY be
+  generated each time the worker is restarted, or it could be persisted across
+  worker restarts, as long as it is unique across the whole cluster.
+
+- `CancelReq` : A boolean flag indicating a request to the worker to cancel the
+  job.
+
+- `JobOps` : Type-specific job options. Represented as a JSON object. This MAY
+  contain fields like `"source"`, `"target"`, `"dbname"` etc.
+
+- `JobInfo` : A per-job info object. Represented as JSON. This object will
+  contain details about a job's current state. It MAY have fields such as
+  `"update_seq"`, `"error_count"`, etc.
+
+- `HealthTimeout` : Worker timeout defined how often the worker will update
+  they health entry to maintain liveness. The worker MUST update their health
+  entry at least this often to maintain liveness.
+
+- `HealthCounter` : Workers MUST updated this at least as often as the
+  `HealthTimeout` indicates. For efficiency, workers MAY use FDB atomic
+  increments here.
+
+
+## Job Lifecyle
+
+New jobs are posted to the main jobs table along with an entry in the pending
+queue. `WorkerId` is set to `nil`. `Priority` could be a coarse timestamp, for
+example "run this job at any time during the next 5 minutes", or it MAY be a
+compound key that has a timestamp + tag such that say continuous replications
+always sort before the normal ones. The uniqueness of `JobId` should reduce
+contention during insertion into the queue.
+
+Workers monitor the pending queue of their matching `JobType`. That is,
+`"replication"` workers will monitor the `"replication"` pending queue,
+indexing workers the `"indexing"` queue, etc. If there are jobs ready to run,
+they will attempt to grab one or more jobs from the pending queue and assign it
+to themselves for execution. They do that by setting the `WorkerId` in the jobs
+table entry to their worker ID, removing the job from pending queue and 
creating
+a new entry in their own `"active"` jobs area.
+
+When a job finishes running, either because it was a successful completion or a
+terminal failure, the worker MUST remove the job from its active table and
+clear its `WorkerId` field. Then, based on type-specific behavior, it MUST do
+either one of:
+
+ - Update the `JobState` to `"completed"` and leave the job in the jobs table.
+   It would be up to the job creation logic to inspect and remove it.
+
+ - Delete the job from the system altogether. This may be useful when running
+   `_replicate` jobs, for example.
+
+
+If a user decided to cancel a job that is running, they MUST toggle `CancelReq`
+job field to `true`. Then, they MUST wait for the job to stop running. If the
+job is not running, they may directly remove the job from the pending and the
+main jobs table.
+
+### Worker Capacity and Long Running Jobs
+
+Some background jobs, like continuous replication jobs, might run for years
+without interruption once they start executing. In that case, workers MAY
+choose to periodically stop some jobs and re-insert them into the pending queue
+for execution, and choose to run other jobs which have been waiting. This
+behavior is job type specific. Ensuring fairness and preventing starvation is
+also job type-specific.
+
+Worker MAY decide how often and how many jobs they pull from the pending queue.
+For example, they could use a dampening scheme where the longer their active
+queue grows, the slower the rate at which they pull jobs from the queue. That
+behavior is job type specific.
+
+Another advanced scheme is where workers could start on demand based on
+the pending queue backlog. When the backlog subsides they would stop
+(with appropriate dampening factors to reduce flapping). That is again job type
+specific, and is outside the scope of this RFC.
+
+## Worker Lifecyle
+
+Workers can be dynamically started and stopped. When workers start, they MUST
+register themselves in the `"workers"` subspace based on their `WorkerType`.
+Each change to the workers list, should be accompanied by updating the
+`"workers_vs"` versionstamp. That versionstamp MAY be efficiently monitored for
+changes using a watch.
+
+
+### Worker Liveness Monitoring
+
+As long as the workers are active and ready to process jobs, they MUST
+periodically ping their health key. The period MUST be less than the
+`HealthTimeout` value.
+
+Worker nodes MUST monitor their neighbors for liveness. When they detect an
+unresponsive neighbor, they MUST remove their jobs and re-enqueue them in the
+pending queue, then remove the dead neighbor from the workers membership list
+as well. To avoid all worker nodes, monitoring all other workers, each worker
+will monitor their neighbor to "right" in the sorted list of live workers.
+
+### Liveness and Race Conditions
+
+In case workers come back after not updating their health entry often enough,
+and another worker had cleared and re-submitted their jobs, they MUST
+monitor their own presence in the membership list, and, if they detect they
+were removed, MUST stop running all their jobs. Then they MUST re-execute their
+startup logic: add themselves to the membership list, restart health key pings,
+and start picking jobs off of the pending jobs queue, and so on.
+
+### Clean Worker Shutdown
+
+If a worker is stopped cleanly, they MUST stop accepting new jobs from the
+pending queue, MUST stop running their jobs, MUST resubmit those jobs
+back into the pending queue. Finally, they MUST clear their health entry and
+remove themselves from workers membership list.
+
+### User Job Cancellation
+
+If a user decides to a cancel an active job, they MUST update the `CancelReq`
+field to `true`. Workers should periodically monitor that field and should stop
+running the job as soon as they detect it becoming `true`.
+
+## `/_active_tasks` API Compatibility
+
+The shape of the active table was deliberately constructed with the hopes of
+using it to implement the `/_active_tasks` API endpoint.
+
+When the HTTP API is accessed, a single `get_range(...)` request to
+`("couch_workers", "active")` subspace MAY be used to generate all the
+`_active_tasks` results. A job's `JobOpts` and `JobInfo` fields should contain
+most of needed information.
+
+# Advantages and Disadvantages
+
+Another alternative discussed on the mailing list is to mimic the
+pre-FoundationDB replication job scheduling model, where replication jobs are
+distributed between all the workers immediately when they are created. Each
+worker has their own separate job queue. The advantage of that model is that
+the queues and jobs are sharded evenly amongst the workers, avoiding potential
+hot key ranges around the global queue pushes and pops. The disadvantage is 
that it
+seemed like a premature optimization. Also, it would only allow for a 
completely
+fair scheduling algorithm as opposed letting workers throttle the number of
+jobs they pick up from the global queue based on load or other scheduling
+parameters.
+
+An alternative to health monitoring of individual workers is to periodically
+sort jobs based on a `LastUpdated` timestamp field and then automatically
+re-enqueue jobs that have not been updated in a while. The disadvantage of that
+approach is that it even though it might simplify the data model, it doesn't
+eliminate the need for having workers. Since workers are already present, it
+might be useful to expose their state explicitly so it can be inspected and
+monitored.
+
+## Possible Future Extensions
+
+Since all job keys and values are just FDB tuples and json encoded objects, in
+the future it might be possible to accept external jobs, not just jobs defined
+by the CouchDB internals. Also, since workers could be written in any language
+as long as they can talk to the FDB cluster, and follow the behavior describes
+in the design, it opens the possibility in the future have custom (user
+defined) workers of different types.
+
+# Key Changes
+
+ - New job execution framework
+ - A single global job queue for each job type
+ - Workers can dynamically scale up or down
+ - Worker health monitoring to detect dead workers
+ - Potential `_active_tasks` API implementation
+
+## Applications and Modules affected
+
+Replication and indexing
 
 Review comment:
   Good call.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to