davisp commented on a change in pull request #409: RFC for CouchDB background workers URL: https://github.com/apache/couchdb-documentation/pull/409#discussion_r289970807
########## File path: rfcs/007-background-jobs.md ########## @@ -0,0 +1,350 @@ +--- +name: Formal RFC +about: Submit a formal Request For Comments for consideration by the team. +title: 'Background jobs with FoundationDB' +labels: rfc, discussion +assignees: '' + +--- + +[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ ) + +# Introduction + +This document describes a data model, implementation, and an API for running +CouchDB background jobs with FoundationDB. + +## Abstract + +CouchDB background jobs are used for things like index building, replication +and couch-peruser processing. We present a generalized model which allows +creation, running, and monitoring of these jobs. + +The document starts with a description of the framework API in Erlang +pseudo-code, then we show the data model, followed by the implementation +details. + +## Requirements Language + +[NOTE]: # ( Do not alter the section below. Follow its instructions. ) + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +document are to be interpreted as described in +[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt). + +## Terminology + +--- + +`Job`: A unit of work, identified by a `JobId` and also having a `Type`. + +`Worker` : A language-specific execution unit that runs the job. Could be an +Erlang process, a thread, or just a function. + +`Job table`: An FDB subspace holding the list of jobs. + +`Pending job`: A job that is waiting to run. + +`Pending queue` : A queue of pending jobs ordered by priority. + +`Running job`: A job which is currently executing. To be considered "running" +the worker must periodically update the job's state in the global job table. + +`Priority`: A job's priority specifies its order in the pending queue. Priority +can by any term that can be encoded as a key in the FoundationDB's tuple layer. The +exact value of `Priority` is job type specific. It MAY be a rough timestamp, a +`Sequence`, a list of tags, etc. + +`Job re-submission` : Re-submitting a job means putting a previously running +job back into the pending queue. + +`Activity monitor` : Functionality implemented by the framework which checks +job liveness (activity). If workers don't update their status often enough, +activity monitor will re-enqueue their jobs as pending. This ensures jobs make +progress even if some workers terminate unexpectedly. + +`JobState`: Describes the current state of the job. The possible values are +`"running"`, `"pending"`, and `"finished"`. These are the minimal number of +states needed to describe a job's behavior in respect to this framework. Each +job type MAY have additional, type specific states, such as `"failed`", +`"error"`, `"retrying"`, etc. + +`Sequence`: a 13 byte value formed by combining the current `Incarnation` of +the database and the `Versionstamp` of the transaction. Sequences are +monotonically increasing even when a database is relocated across FoundationDB +clusters. See (RFC002) for a full explanation. + +--- + +# Framework API + +This section describes the job creation and worker implementation APIs. It doesn't +describe how the framework is implemented. The intended audience is CouchDB +developers using this framework to implement background jobs for indexing, +replication, and couch-peruser. + +Both the job creation and the worker implementation APIs use a `JobOpts` map to +represent a job. It MAY also contain these top level fields: + + * `"priority"` : The value of this field will contain the `Priority` value of + the job. `Priority` is job-type specific. + * `"data"`: An opaque object (map), from the framework's point of view, + containing job-type specific data. It MAY contain an update sequence, or an + error message, for example. + * `"cancel"` : Boolean field defaulting to `false`. If `true` indicates the + user intends to stop a job's execution. + * `"resubmit"` : Boolean field defaulting to `false`. If `true` indicates + the job should be re-submitted. Review comment: Both `cancel` and `resubmit` seem odd to have as boolean k/v fields that a user has to worry about. Given that we have these as functions for job life cycle management I'd not expect to see as stateful fields. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
