[jira] [Resolved] (KAFKA-2743) Forwarding task reconfigurations in Copycat can deadlock with rebalances and has no backoff

Guozhang Wang (JIRA) Thu, 05 Nov 2015 08:44:59 -0800

     [ 
https://issues.apache.org/jira/browse/KAFKA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Guozhang Wang resolved KAFKA-2743.
----------------------------------
    Resolution: Fixed

Issue resolved by pull request 422
[https://github.com/apache/kafka/pull/422]

> Forwarding task reconfigurations in Copycat can deadlock with rebalances and 
> has no backoff
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2743
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2743
>             Project: Kafka
>          Issue Type: Bug
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>             Fix For: 0.9.0.0
>
>
> There are two issues with the way we're currently forwarding task 
> reconfigurations. First, the forwarding is performed synchronously in the 
> DistributedHerder's main processing loop. If node A forwards a task 
> reconfiguration and node B has started a rebalance process, we can end up 
> with distributed deadlock because node A will be blocking on the HTTP request 
> in the thread that would otherwise handle heartbeating and rebalancing.
> Second, currently we just retry aggressively with no backoff. In some cases 
> the node that is currently thought to be the leader will legitimately be down 
> (it shutdown and the node sending the request didn't rebalance yet), so we 
> need some backoff to avoid unnecessarily hammering the network and the huge 
> log files that result from constant errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (KAFKA-2743) Forwarding task reconfigurations in Copycat can deadlock with rebalances and has no backoff

Reply via email to