[GitHub] wohali commented on a change in pull request #123: Documentation for the scheduling replicator

git Fri, 28 Apr 2017 14:01:48 -0700

wohali commented on a change in pull request #123: Documentation for the 
scheduling replicator
URL: 
https://github.com/apache/couchdb-documentation/pull/123#discussion_r114017406


 ##########
 File path: src/replication/replicator.rst
 ##########
 @@ -154,98 +203,253 @@ the following order:
 .. code-block:: javascript
 
     {
-        "_id": "doc_A",
-        "source":  "http://myserver.com:5984/foo";,
-        "target":  "http://user:pass@localhost:5984/bar";
+        "_id": "my_rep",
+        "source":  "http://user:pass@localhost:5984/foo";,
+        "target":  "http://user:pass@localhost:5984/bar";,
+        "create_target":  true,
+        "continuous": true
     }
 
 and
 
 .. code-block:: javascript
 
     {
-        "_id": "doc_B",
-        "source":  "http://myserver.com:5984/foo";,
-        "target":  "http://user:pass@localhost:5984/bar";
-    }
-
-Both describe exactly the same replication (only their ``_ids`` differ). In 
this
-case document ``doc_A`` triggers the replication, getting updated by CouchDB
-with the fields ``_replication_state``, ``_replication_state_time`` and
-``_replication_id``, just like it was described before. Document ``doc_B``
-however, is only updated with one field, the ``_replication_id`` so it will
-look like this:
-
-.. code-block:: javascript
-
-    {
-        "_id": "doc_B",
-        "source":  "http://myserver.com:5984/foo";,
+        "_id": "my_rep_dup",
+        "source":  "http://user:pass@localhost:5984/foo";,
         "target":  "http://user:pass@localhost:5984/bar";,
-        "_replication_id":  "c0ebe9256695ff083347cbf95f93e280"
+        "create_target":  true,
+        "continuous": true
     }
 
-While document ``doc_A`` will look like this:
-
-.. code-block:: javascript
-
-    {
-        "_id": "doc_A",
-        "source":  "http://myserver.com:5984/foo";,
-        "target":  "http://user:pass@localhost:5984/bar";,
-        "_replication_id":  "c0ebe9256695ff083347cbf95f93e280",
-        "_replication_state":  "triggered",
-        "_replication_state_time":  "2011-02-17T20:22:02+01:00"
-    }
+Both describe exactly the same replication (only their ``_ids``
+differ). In this case document ``my_rep`` triggers the
+replication. While ``my_rep_dup``` will fail. Inspecting
+``_scheduler/docs`` explains exactly why it failed:
+
+.. code-block:: json
+
+        {
+            "database": "_replicator",
+            "doc_id": "my_rep_dup",
+            "error_count": 1,
+            "id": null,
+            "info": "Replication 
`a81a78e822837e66df423d54279c15fe+continuous+create_target` specified by 
document `my_rep_dup` already started, triggered by document `my_rep` from db 
`_replicator`",
+            "last_updated": "2017-04-05T21:41:51Z",
+            "source": "http://adm:*****@localhost:5984/foo/";,
+            "start_time": "2017-04-05T21:41:51Z",
+            "state": "failed",
+            "target": "http://adm:*****@localhost:5984/bar/";
+        }
 
-Note that both document get exactly the same value for the ``_replication_id``
-field. This way you can identify which documents refer to the same replication 
-
-you can for example define a view which maps replication IDs to document IDs.
+Notice the state for this replication is ``failed``. Unlike
+``crashing``, ``failed`` state is terminal. As long as both documents
+are present replicator will not retry to run my_rep_dub
+replication. Another reason could be malformed documents. For
+example if worker process count is specified as a string
+(``"worker_processes": "a few"``) instead of an integer.
+
+Replication Scheduler
+=====================
+
+Once replication jobs are created they are managed by the scheduler.
+The scheduler is the replication component which periodically stops
+some jobs and starts others. This behavior makes it posssible to have a
+larger number of jobs than what the cluster could run at one time.
+Replication jobs which keep failing will be penalized and forced to
+wait. The wait time increases exponentially with each consecutive
+failure.
+
+When deciding which jobs to stop and which to start, the scheduler
+uses a round-robin algorithm to ensure fairness. Jobs which have been
+running the longest time will be stopped, and jobs which have been
+waiting the longest time will be started.
+
+.. note:: Non-continuous (normal) replication are treated differently
+          once they start running. See :ref:`Normal vs Continuous
+          Replications` section for more information.
+
+The behavior of the scheduler can configured via ``max_jobs``,
+``interval`` and ``max_churn`` options. See :ref:`Replicator
+configuration section <config/replicator>` for additional information.
+
+Replication states
+==================
+
+Replication jobs during their life-cycle pass through various
+states. This is a diagram of all the states and transitions
+between them:
+
+.. figure:: ../../images/replication-state-diagram.svg
+     :align: center
+     :alt: Replication state diagram
+
+     Replication state diagram
+
+Green and red shapes represent replication job states.
+
+Yellow shapes represent external APIs, that's how users interact with
+the replicator. Writing documents to ``_replicator`` is the preferred
+way of creating replications, but posting to the ``_replicate`` HTTP
+endpoint is also supported.
+
+.. note:: Replications created through the ``_replicate`` endpoint
+   will not survive cluster node restart where they are running, and
+   once completed they will be removed from the system, so their
+   completion state is preserved in ``_scheduler/jobs`` endpoint.
+
+White shapes indicate internal API boundaries and point to how the
+replicator is structured internally. There are two stages in the
+processing: the first is where replication documents are parsed and
+become replication jobs, and the second is the scheduler.  The
+scheduler runs replication jobs, periodically stopping and starting
+some. Jobs posted via the ``_replicate`` endpoint bypass the first
+component and go straight to the scheduler.
+
+States descriptions
+-------------------
+
+Before explaining the details of each state, it is worth noticing that
+color and shape of each state in the diagram:
+
+`Green` vs `red` partitions states into "healthy" and "unhealthy",
+respectively. Unhealthy states indicate something has gone wrong and
+it might need user's attention.
+
+`Rectangle` vs `oval` separates "terminal" states from "non-terminal"
+ones. Terminal states are those which will not transition to other
+states any more. Informally, jobs in a terminal state will not be
+retried and don't consume memory or CPU resources.
+
+ * ``Initializing``: Indicates replicator has noticed the change
+   from the replication document. Jobs should transition quickly
+   through this state. Being stuck here for a while could mean there
+   is an internal error.
+
+ * ``Failed``: Replication document could not be processed and turned
+   into a valid replication job for the scheduler. This state is
+   terminal and requires user intervention to fix the problem. Typical
+   reasons for ending up in this state is a malformed document. For
+   example specifying an integer for a parameter which accepts a
+   boolean. Another reason could be specifying a duplicate
 
 Review comment:
   Sentence fragments, revise. "For example, specifying an integer for a 
parameter which accepts a boolean will result in this state. Another reason for 
failure could be..."
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] wohali commented on a change in pull request #123: Documentation for the scheduling replicator

Reply via email to