Github user rnewson commented on a diff in the pull request:
https://github.com/apache/couchdb-couch-replicator/pull/34#discussion_r58082617
--- Diff: src/couch_replicator_manager.erl ---
@@ -134,9 +134,14 @@ replication_error(#rep{id = {BaseId, _} = RepId},
Error) ->
continue(#rep{doc_id = null}) ->
{true, no_owner};
continue(#rep{id = RepId}) ->
- Owner = gen_server:call(?MODULE, {owner, RepId}, infinity),
- {node() == Owner, Owner}.
-
+ case rep_state(RepId) of
+ nil ->
+ {false, nonode};
+ #rep_state{rep = #rep{db_name = DbName, doc_id = DocId}} ->
+ Node = node(),
+ Owner = owner(DbName, DocId, [Node | nodes()]),
--- End diff --
The reason we stopped using `nodes()` and instead tracked the live set of
nodes in server state was to avoid situations where, momentarily, all nodes
declined to run a job (thinking that some other node will).
Now, it's possible that this didn't help but I don't know how we can be
certain either way. Obviously we intend to write a better distributed job
scheduler in the near future so unwinding this is probably no big deal.
If you believe it's safe to remove the protection described above, then
please modify your second commit to remove the `live` member of state and all
the code that manipulates it, since it will be vestigial.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---