[ 
https://issues.apache.org/jira/browse/COUCHDB-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211101#comment-15211101
 ] 

Nick Vatamaniuc commented on COUCHDB-2975:
------------------------------------------

We might have to increase intensity threshold.  One common use case that will 
trigger is one source to multiple targets replications. Source fails, So all 
replications will fail as well. Tested it with 1 source to 200 targets. Then 
killed the source and noticed supervisors were restarted:

([email protected])4> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.352.0>,<26873.355.0>,<26910.354.0>],[]} % before deleting source
([email protected])5> rpc:multicall(erlang, whereis, [couch_replicator_job_sup]).
{[<0.5617.4>,<26873.7071.3>,<26910.8924.3>],[]} % after deleting source

Saw we already have some protection again failed repeated replication re-starts 
as the “max_replication_retry_count” parameter. By default it is 10. So 10 
failed replication starts for a particular replication will cancel that 
replication. Once it successfully starts once, the failed retries number gets 
reset back to max (10).

Another thing, noticed replications will restart even without {{transient}} 
supervisors if they are killed with an exit reason other than 'kill' (brutal 
kill). So if the goal is to just restart them, sending them exit(Pid, meh) 
should suffice. 

> Automatically restart replication jobs if they crash
> ----------------------------------------------------
>
>                 Key: COUCHDB-2975
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-2975
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Replication
>            Reporter: Robert Newson
>
> We currently use the temporary restart strategy for replication jobs, which 
> means if they crash they are not restarted.
> Instead, let's use the transient restart strategy, ensuring they are 
> restarted on abnormal termination, while still allowing these tasks to end 
> successfully on completion or cancellation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to