Ewen Cheslack-Postava created KAFKA-2743:
--------------------------------------------
Summary: Forwarding task reconfigurations in Copycat can deadlock
with rebalances and has no backoff
Key: KAFKA-2743
URL: https://issues.apache.org/jira/browse/KAFKA-2743
Project: Kafka
Issue Type: Bug
Components: copycat
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Fix For: 0.9.0.0
There are two issues with the way we're currently forwarding task
reconfigurations. First, the forwarding is performed synchronously in the
DistributedHerder's main processing loop. If node A forwards a task
reconfiguration and node B has started a rebalance process, we can end up with
distributed deadlock because node A will be blocking on the HTTP request in the
thread that would otherwise handle heartbeating and rebalancing.
Second, currently we just retry aggressively with no backoff. In some cases the
node that is currently thought to be the leader will legitimately be down (it
shutdown and the node sending the request didn't rebalance yet), so we need
some backoff to avoid unnecessarily hammering the network and the huge log
files that result from constant errors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)