This is an automated email from the ASF dual-hosted git repository. jiahuili430 pushed a commit to branch update-replication-docs in repository https://gitbox.apache.org/repos/asf/couchdb.git
commit a6531cda2b27f2b95e6ca59c9f363171eb6d7e27 Author: Jiahui Li <[email protected]> AuthorDate: Thu Jan 29 21:55:37 2026 -0600 Docs: Fix docs about replication --- src/couch_replicator/README.md | 19 ++++++++----------- src/docs/src/replication/conflicts.rst | 2 +- src/docs/src/replication/protocol.rst | 4 ++-- src/docs/src/replication/replicator.rst | 16 ++++++++-------- 4 files changed, 19 insertions(+), 22 deletions(-) diff --git a/src/couch_replicator/README.md b/src/couch_replicator/README.md index cd1b7bd9e..8a631219d 100644 --- a/src/couch_replicator/README.md +++ b/src/couch_replicator/README.md @@ -40,7 +40,7 @@ A description of each child: control algorithm to converge on the channel capacity. Implemented using a 16-way sharded ETS table to maintain connection state. The table sharding code is split out to `couch_replicator_rate_limiter_tables` module. The - purpose of the module it to maintain and continually estimate sleep + purpose of the module is to maintain and continually estimate sleep intervals for each connection represented as a `{Method, Url}` pair. The interval is updated accordingly on each call to `failure/1` or `success/1` calls. For a successful request, a client should call `success/1`. Whenever @@ -79,7 +79,7 @@ A description of each child: jobs running less than `replicator.max_jobs` (default 500). So the functions does these operations (actual code paste): - ``` + ```erl Running = running_job_count(), Pending = pending_job_count(), stop_excess_jobs(State, Running), @@ -116,7 +116,7 @@ A description of each child: interesting part is how the scheduler picks which jobs to stop and which ones to start: - * Stopping: When picking jobs to stop the scheduler will pick longest + * Stopping: When picking jobs to stop the scheduler will pick the longest running continuous jobs first. The sorting callback function to get the longest running jobs is unsurprisingly called `longest_running/2`. To pick the longest running jobs it looks at the most recent `started` @@ -143,14 +143,14 @@ A description of each child: on how this algorithm works. The last part is how the scheduler treats jobs which keep crashing. If a - job is started but then crashes then that job is considered unhealthy. The + job is started but then crashes, that job is considered unhealthy. The main idea is to penalize such jobs such that they are forced to wait an exponentially larger amount of time with each consecutive crash. A central part to this algorithm is determining what forms a sequence of consecutive crashes. If a job starts then quickly crashes, and after its next start it crashes again, then that would become a sequence of 2 consecutive crashes. The penalty then would be calculated by `backoff_micros/1` function where - the consecutive crash count would end up as the exponent. However for + the consecutive crash count would end up as the exponent. However, for practical concerns there is also maximum penalty specified and that's the equivalent of 10 consecutive crashes. Timewise it ends up being about 8 hours. That means even a job which keep crashing will still get a chance to @@ -189,10 +189,10 @@ A description of each child: is handling of upgrades from the previous version of the replicator when transient states were written to the documents. Two such states were `triggered` and `error`. Both of those states are removed from the document - then then update proceeds in the regular fashion. `failed` documents are + then update proceeds in the regular fashion. `failed` documents are also ignored here. `failed` is a terminal state which indicates the document was somehow unsuitable to become a replication job (it was malformed or a - duplicate). Otherwise the state update proceeds to `process_updated/2`. + duplicate). Otherwise, the state update proceeds to `process_updated/2`. `process_updated/2` is where replication document updates are parsed and translated to `#rep{}` records. The interesting part here is that the @@ -236,7 +236,7 @@ A description of each child: 1. Filter fetching code has failed. In that case worker returns an error. But because the error could be a transient network error, another worker is started to try again. It could fail and return an error - again, then another one is started and so on. However each consecutive + again, then another one is started and so on. However, each consecutive worker will do an exponential backoff, not unlike the scheduler code. `error_backoff/1` is where the backoff period is calculated. Consecutive errors are held in the `errcnt` field in the ETS table. @@ -255,6 +255,3 @@ A description of each child: cluster, it's ok to check filter changes often. But when there are lots of replications running, having each one checking their filter often is not a good idea. - - - diff --git a/src/docs/src/replication/conflicts.rst b/src/docs/src/replication/conflicts.rst index 704adf62d..315c88707 100644 --- a/src/docs/src/replication/conflicts.rst +++ b/src/docs/src/replication/conflicts.rst @@ -347,7 +347,7 @@ to determine for each document whether it is in a conflicting state: View map functions ================== -Views only get the winning revision of a document. However they do also get a +Views only get the winning revision of a document. However, they do also get a ``_conflicts`` member if there are any conflicting revisions. This means you can write a view whose job is specifically to locate documents with conflicts. Here is a simple map function which achieves this: diff --git a/src/docs/src/replication/protocol.rst b/src/docs/src/replication/protocol.rst index 9f967f6ee..f361bc334 100644 --- a/src/docs/src/replication/protocol.rst +++ b/src/docs/src/replication/protocol.rst @@ -1255,7 +1255,7 @@ Documents-Attachments and may handle it as stream with lesser memory footprint. Content-Length: 87 1. Cook spaghetti - 2. Cook meetballs + 2. Cook meatballs 3. Mix them 4. Add tomato sauce 5. ... @@ -1480,7 +1480,7 @@ one by one without any serialization overhead. Content-Length: 87 1. Cook spaghetti - 2. Cook meetballs + 2. Cook meatballs 3. Mix them 4. Add tomato sauce 5. ... diff --git a/src/docs/src/replication/replicator.rst b/src/docs/src/replication/replicator.rst index 65fd298f5..3d8b6bf7b 100644 --- a/src/docs/src/replication/replicator.rst +++ b/src/docs/src/replication/replicator.rst @@ -490,13 +490,13 @@ Server restart When CouchDB is restarted, it checks its ``_replicator`` databases and restarts replications described by documents if they are not already in -in a ``completed`` or ``failed`` state. If they are, they are ignored. +a ``completed`` or ``failed`` state. If they are, they are ignored. Clustering ========== In a cluster, replication jobs are balanced evenly among all the nodes -nodes such that a replication job runs on only one node at a time. +such that a replication job runs on only one node at a time. Every time there is a cluster membership change, that is when nodes are added or removed, as it happens in a rolling reboot, replicator @@ -760,12 +760,12 @@ There are multiple ways to specify usernames and passwords for replication endpo ... } - This is the prefererred format as it allows including characters like ``@``, ``:`` - and others in the username and password fields. + This is the preferred format as it allows including characters like ``@``, + ``:`` and others in the username and password fields. - In the userinfo part of the endpoint URL. This allows for a more compact - endpoint represention however, it prevents using characters like ``@`` and ``:`` - in usernames or passwords: + endpoint representation however, it prevents using characters like ``@`` + and ``:`` in usernames or passwords: .. code-block:: javascript @@ -795,14 +795,14 @@ There are multiple ways to specify usernames and passwords for replication endpo This method has the downside of the going through the extra step of base64 encoding. In addition, it could give the impression that it encrypts or - hides the credentials so it could encourage invadvertent sharing and + hides the credentials so it could encourage inadvertent sharing and leaking credentials. When credentials are provided in multiple forms, they are selected in the following order: - ``"auth": {"basic": {...}}`` object - URL userinfo - - ``"Authorization: Basic ..."`` header. + - ``"Authorization: Basic ..."`` header First, the ``auth`` object is checked, and if credentials are defined there, they are used. If they are not, then URL userinfo is checked. If credentials
