nickva commented on a change in pull request #409: RFC for CouchDB background 
workers
URL: 
https://github.com/apache/couchdb-documentation/pull/409#discussion_r293107345
 
 

 ##########
 File path: rfcs/007-background-jobs.md
 ##########
 @@ -0,0 +1,337 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Background jobs with FoundationDB'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+This document describes a data model, implementation, and an API for running
+CouchDB background jobs with FoundationDB.
+
+## Abstract
+
+CouchDB background jobs are used for things like index building, replication
+and couch-peruser processing. We present a generalized model which allows
+creation, running, and monitoring of these jobs.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+
+## General Concepts
+
+In the discussion below a job is considered to be an abstract unit of work. It
+is identified by a `JobId` and has a `JobType`. The user creates a job, and
+then it is executed by a job processor. A job processor is language-specific
+execution unit that runs the job. It could be an Erlang process, a thread, or
+just a function.
+
+The API used to create jobs is called the `Job Creation API` and the API used
+by the job processors to run jobs is called the `Job Processing API`.
+
+### Job States
+
+Jobs in the system can be in a few different states. After a job is added and
+is waiting to run, the job is considered to be `pending`. A job executed by
+a job processor is considered to be `running`. When a job is neither `running`,
+nor `pending`, it is considered to be `finished`. This is the state transition
+diagram:
+
+```
+         +------------>+
+         |             |
+         |             v
+ -->[PENDING]     [RUNNING]--->[FINISHED]
+         ^             |
+         |             |
+         +-------------+
+```
+
+
+
+### Typical API Usage
+
+The general pattern of using this API might look like:
+
+  * Job creators:
+    - Call `add/3,4` to add a job
+    - If the job needs to be resubmitted, call `resubmit/1`
+
+  * Job processors:
+    - Call `accept/1,2` and wait until it gets a job to process.
+    - Periodically call `update/2,3` to prevent the job from being re-enqueued
+      due to idleness.
+    - When done running a job, call `finish/2,3`
+
+
+### Job Creation API
+
+```
+add(Type, JobId, JobData[, ScheduledTime]) -> {ok, Job} | {error, Error}
+```
+ - Add a job to be executed by a job processor
+   - `Job` is an opaque object used to represent a job. Users of the API MUST
+     NOT rely on any fields in this map as those may change at any time.
+   - `JobData` is map with a job type-specific data in it. It MAY contain any
+     data as long as it can be properly encoded as JSON.
+   - `ScheduledTime` is an optional parameter to schedule the job to be 
executed
+     at a later time. The format is an integer seconds since UNIX epoch.
+
+```
+remove(Job) -> ok | {error, Error}
+```
+ - Remove a job. If it is running, it will be stopped.
+
+```
+resubmit(Job) -> ok | {error, Error}
+```
+ - Indicates the job should be re-submitted. That is, after the job processor
+   calls `finish/2`, the job will be re-enqueued as `pending` again. If the job
+   is not `running` this function has no effect.
+
+```
+get_job_data(Job) -> {ok, JobData} | {error, Error}
+```
+ - Get `JobData` associated with the job.
+
+```
+get_job_state(Job) -> {ok, pending | running | finished} | {error, Error}
+```
+ - Get the job's state.
+
+```
+set_type_timeout(Type, TimeoutSec) -> ok
+```
+
+ - Set the activity timeout for a job type.
+
+```
+get_type_timeout(Type)  -> {ok, TimeoutSec} | {error, Error}
+```
+
+ - Get the type timeout for a job type.
+
+```
+subscribe(Type, JobId) -> {ok, SubscriptionId, JobState}
+```
+
+ - Subscribe to receive job state updates. Notifications can be received using
+ the `wait_job_state/2` call.
+
+```
+unsubscribe(SubscriptionId) -> ok
+```
+ - Unsubscribe from receiving job state updates.
+
+```
+wait(SubscriptionId, Timeout) -> {Type, JobId, JobState, JobData} | timeout
+wait([SubscriptionId], Timeout) -> {Type, JobId, JobState, JobData} | timeout
+
+```
+ - Receive subscription notification updates from one or more subscriptions.
+
+### Job Processing API
+
+```
+accept(Type[, MaxScheduledTime]) -> {ok, Job} | {error, Error}
+```
+
+ - Get a `pending` job and start running it. If there are no pending jobs, this
+  call will block until some jobs are enqueued. The `MaxScheduledTime`
+  parameter, if specified, indicates that only jobs between now and
+  `MaxScheduledTime` time should be accepted.
+
+```
+update(Tx, Job[, JobData]) -> {ok, Job} | {error, halt | Error}
+
+```
+ - This MAY be called to update a job's `JobData`. It MUST be called at least
+   as often as the configured timeout value for the job’s type. Not doing this
+   will result in the job being re-enqueued. If `halt` is returned, the job
+   processor MUST stop running the job. Job processors MUST call `update/2,3`
+   in any write transactions it performs in order to guarantee mutual exclusion
+   that at most one job processor is executing a particular job at a time.
+
+```
+finish(Tx, Job[, JobData]) -> ok | {error, halt | Error}
+```
+ - Called by the job processor when it has finished running the job. The
+   `JobData` parameter MAY contain a final result. If `halt` is returned, it
+   means that the `JobData` value wasn't updated. Job processors MUST call
+   `update/2,3` or `finish/2,3` in any write transactions it performs in order
+   to guarantee mutual exclusion that at most one job processor is executing a
+   particular job at a time.
+
+```
+resubmit(Tx, Job[, ScheduledTime]) -> {ok, Job} | {error, halt | Error}
+```
+ - Mark the job for resubmission. The job won't be re-enqueued until
+   `finish/2,3` is called.
+
+```
+is_resubmitted(Job) -> true | false
+```
+ - Check if the job object was marked for resubmission. The job processor MAY
+   call this function on the `Job` object that gets returned from the
+   `update/2,3` function to determine if job creator had requested the job to
+   be resubmitted. The job won't actually be re-enqueued until `finish/2,3`
+   function is called.
+
+# Framework Implementation Details
+
+This section discusses how some of the framework functionality is implemented.
+
+All the coordination between job creation and job processing is done via
+FoundationDB. There is a top level `"couch_jobs"` subspace. All the subspaces
+mentioned blow will be under this subspace.
+
+Each job managed by the framework will have an entry in the main `jobs table`.
+Pending jobs are added to a `pending queue` subspace. After they are
+accepted by a jobs processor, they jobs removed from the pending queue and 
added
+to the `active jobs` subspace.
+
+Job states referenced in the API section are essentially defined based on the
+presence in any of these subspaces:
+
+ * If a job is in the `pending queue` it is considered `pending`
+ * If a job is in the `active jobs` subspace, then it is `running`
+ * If a job is not `pending` or `running` then it is considered `finished`
+
+### Activity Monitor
+
+Job processors may suddenly crash and stop running their jobs. In that case the
+framework will automatically make those jobs `pending` after a timeout. That
+ensures the jobs continue to make progress. To avoid getting re-enqueued as
+`pending` due the timeout, each job processor must periodically call the
+`update/2,3` function. That functionality is implemented by the `activity
+monitor`. It periodically watches a per-type versionstamp-ed key, then scans
+`active jobs` subspace for any `running` jobs which haven't updated their
+entries during the timeout period.
+
+### Subscription Notifications
+
+Subscription notifications are managed separately for each job type. They use
+a per-type versionstamp-ed watch to monitor which jobs have updated since
+the last time it delivered notifications to the subscribers.
+
+### Data Model
+
+ * `("couch_jobs", "data", Type, JobId) = (Sequence, JobLock, ScheduledTime, 
Resubmit, JobData)`
+ * `("couch_jobs", "pending", Type, ScheduledTime, JobId) = ""`
+ * `("couch_jobs", "watches", "pending", Type) = (Sequence, Counter)`
+ * `("couch_jobs", "watches", "activity", Type) = Sequence`
+ * `("couch_jobs", "activity_timeout", Type) = ActivityTimeout`
+ * `("couch_jobs", "activity", Type, Sequence) = JobId`
+
+
+### Job Lifecycle Implementation
+
+This section describes how the framework implements some of the API functions.
+
+ - `add/3,4` :
+   * Add the new job to the main jobs table
+   * If a job with the same `JobId` exists, return an error
+   * Update `"pending"` watch for the type with a new versionstamp and bump its
+     counter.
+   * `JobLock` is set to `null`.
+
+ - `remove/1` :
+   * Job is removed from the main jobs table
+   * Job processor during the next `update/2,3` call will get a `halt` error
+     and know to stop running the job.
+
+ - `accept/1,2` :
+   * Generate a unique `JobLock` UUID.
+   * Attempt to dequeue the item from the pending queue, then assign it the
+     `JobLock` in the jobs table.
+   * Create entry in the `"activity"` subspace.
+   * If there are no pending jobs, get a watch for the `"pending"` queue and
+     wait until it fires, then try again.
+
+ - `update/2,3`:
+   * If job is missing from the main jobs table return `halt`
+   * Check if `JobLock` matches, otherwise return `halt`
+   * Delete old `"activity"` sequence entry
+   * Maybe update `JobData`
+   * Create a new `"activity"` sequence entry and in main job table
+   * Update `"watches"` sequence for that job type
+
+ - `finish/2,3`:
+   * If job is missing from the main jobs table return `halt`
+   * Check if `JobLock` matches, otherwise returns `halt`
+   * Delete old `"activity"` sequence entry
+   * If `Resubmit` field is `true`, re-enqueue the job, and set `Resubmit` to 
`false`.
+   * Set job table's `JobLock` to `null`
+
+ - `resubmit/1`:
+   * Set the `Resubmit` field to `true`
+   * The job will be re-enqueued by the `finish/2,3` call
+
+# Advantages and Disadvantages
+
+A previous draft of this RFC discussed an implementation centered around
 
 Review comment:
   Agree.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to