Ewen Cheslack-Postava created KAFKA-2743:
--------------------------------------------

             Summary: Forwarding task reconfigurations in Copycat can deadlock 
with rebalances and has no backoff
                 Key: KAFKA-2743
                 URL: https://issues.apache.org/jira/browse/KAFKA-2743
             Project: Kafka
          Issue Type: Bug
          Components: copycat
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava
             Fix For: 0.9.0.0


There are two issues with the way we're currently forwarding task 
reconfigurations. First, the forwarding is performed synchronously in the 
DistributedHerder's main processing loop. If node A forwards a task 
reconfiguration and node B has started a rebalance process, we can end up with 
distributed deadlock because node A will be blocking on the HTTP request in the 
thread that would otherwise handle heartbeating and rebalancing.

Second, currently we just retry aggressively with no backoff. In some cases the 
node that is currently thought to be the leader will legitimately be down (it 
shutdown and the node sending the request didn't rebalance yet), so we need 
some backoff to avoid unnecessarily hammering the network and the huge log 
files that result from constant errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to