[ https://issues.apache.org/jira/browse/KAFKA-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede closed KAFKA-262. ------------------------------- > Bug in the consumer rebalancing logic causes one consumer to release > partitions that it does not own > ---------------------------------------------------------------------------------------------------- > > Key: KAFKA-262 > URL: https://issues.apache.org/jira/browse/KAFKA-262 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.7 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Fix For: 0.7.1 > > Attachments: kafka-262-v3.patch, kafka-262.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > The consumer maintains a cache of topics and partitions it owns along with > the fetcher queues corresponding to those. But while releasing partition > ownership, this cache is not cleared. This leads the consumer to release a > partition that it does not own any more. This can also lead the consumer to > commit offsets for partitions that it no longer consumes from. > The rebalance operation goes through following steps - > 1. close fetchers > 2. commit offsets > 3. release partition ownership. > 4. rebalance, add topic, partition and fetcher queues to the topic registry, > for all topics that the consumer process currently wants to own. > 5. If the consumer runs into conflict for one topic or partition, the > rebalancing attempt fails, and it goes to step 1. > Say, there are 2 consumers in a group, c1 and c2. Both are consuming topic1 > with partitions 0-0, 0-1 and 1-0. Say c1 owns 0-0 and 0-1 and c2 owns 1-0. > 1. Broker 1 goes down. This triggers rebalancing attempt in c1 and c2. > 2. c1's release partition ownership and during step 4 (above), fails to > rebalance. > 3. Meanwhile, c2 completes rebalancing successfully, and owns partition 0-1 > and starts consuming data. > 4. c1 starts next rebalancing attempt and during step 3 (above), it releases > partition 0-1. During step 4, it owns partition 0-0 again, and starts > consuming data. > 5. Effectively, rebalancing has completed successfully, but there is no owner > for partition 0-1 registered in Zookeeper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)