[
https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211101#comment-15211101
]
Nick Vatamaniuc commented on COUCHDB-2975:
------------------------------------------
We might have to increase intensity threshold. One common use case that will
trigger is one source to multiple targets replications. Source fails, So all
replications will fail as well. Tested it with 1 source to 200 targets. Then
killed the source and noticed supervisors were restarted:
([email protected])4> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.352.0>,<26873.355.0>,<26910.354.0>],[]} % before deleting source
([email protected])5> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.5617.4>,<26873.7071.3>,<26910.8924.3>],[]} % after deleting source
Saw we already have some protection again failed repeated replication re-starts
as the “max_replication_retry_count” parameter. By default it is 10. So 10
failed replication starts for a particular replication will cancel that
replication. Once it successfully starts once, the failed retries number gets
reset back to max (10).
Another thing, noticed replications will restart even without {{transient}}
supervisors if they are killed with an exit reason other than 'kill' (brutal
kill). So if the goal is to just restart them, sending them exit(Pid, meh)
should suffice.
> Automatically restart replication jobs if they crash
> ----------------------------------------------------
>
> Key: COUCHDB-2975
> URL: https://issues.apache.org/jira/browse/COUCHDB-2975
> Project: CouchDB
> Issue Type: Improvement
> Components: Replication
> Reporter: Robert Newson
>
> We currently use the temporary restart strategy for replication jobs, which
> means if they crash they are not restarted.
> Instead, let's use the transient restart strategy, ensuring they are
> restarted on abnormal termination, while still allowing these tasks to end
> successfully on completion or cancellation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)