[ 
https://issues.apache.org/jira/browse/KAFKA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede closed KAFKA-262.
-------------------------------

> Bug in the consumer rebalancing logic causes one consumer to release 
> partitions that it does not own
> ----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-262
>                 URL: https://issues.apache.org/jira/browse/KAFKA-262
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.7
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>             Fix For: 0.7.1
>
>         Attachments: kafka-262-v3.patch, kafka-262.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The consumer maintains a cache of topics and partitions it owns along with 
> the fetcher queues corresponding to those. But while releasing partition 
> ownership, this cache is not cleared. This leads the consumer to release a 
> partition that it does not own any more. This can also lead the consumer to 
> commit offsets for partitions that it no longer consumes from. 
> The rebalance operation goes through following steps -
> 1. close fetchers
> 2. commit offsets
> 3. release partition ownership. 
> 4. rebalance, add topic, partition and fetcher queues to the topic registry, 
> for all topics that the consumer process currently wants to own. 
> 5. If the consumer runs into conflict for one topic or partition, the 
> rebalancing attempt fails, and it goes to step 1.
> Say, there are 2 consumers in a group, c1 and c2. Both are consuming topic1 
> with partitions 0-0, 0-1 and 1-0. Say c1 owns 0-0 and 0-1 and c2 owns 1-0.
> 1. Broker 1 goes down. This triggers rebalancing attempt in c1 and c2.
> 2. c1's release partition ownership and during step 4 (above), fails to 
> rebalance.
> 3. Meanwhile, c2 completes rebalancing successfully, and owns partition 0-1 
> and starts consuming data.
> 4. c1 starts next rebalancing attempt and during step 3 (above), it releases 
> partition 0-1. During step 4, it owns partition 0-0 again, and starts 
> consuming data.
> 5. Effectively, rebalancing has completed successfully, but there is no owner 
> for partition 0-1 registered in Zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to