[
https://issues.apache.org/jira/browse/KAFKA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hang Qi updated KAFKA-1475:
---------------------------
Attachment: 5055aeee-zk.txt
> Kafka consumer stops LeaderFinder/FetcherThreads, but application does not
> know
> -------------------------------------------------------------------------------
>
> Key: KAFKA-1475
> URL: https://issues.apache.org/jira/browse/KAFKA-1475
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer
> Affects Versions: 0.8.0
> Environment: linux, rhel 6.4
> Reporter: Hang Qi
> Assignee: Neha Narkhede
> Labels: consumer
> Attachments: 5055aeee-zk.txt
>
>
> We encounter an issue of consumers not consuming messages in production. (
> this consumer has its own consumer group, and just consumes one topic of 3
> partitions.)
> Based on the logs, we have following findings:
> 1. Zookeeper session expires, kafka highlevel consumer detected this event,
> and released old broker parition ownership and re-register consumer.
> 2. Upon creating ephemeral path in Zookeeper, it found that the path still
> exists, and try to read the content of the node.
> 3. After read back the content, it founds the content is same as that it is
> going to write, so it logged as "[ZkClient-EventThread-428-ZK/kafka]
> (kafka.utils.ZkUtils$) -
> /consumers/consumerA/ids/consumerA-1400815740329-5055aeee exists with value {
> "pattern":"static", "subscription":{ "TOPIC": 1},
> "timestamp":"1400846114845", "version":1 } during connection loss; this is
> ok", and doing nothing.
> 4. After that, it throws exception indicated that the cause is
> "org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode =
> NoNode for /consumers/consumerA/ids/consumerA-1400815740329-5055aeee" during
> rebalance.
> 5. After all retries failed, it gave up retry and left the
> LeaderFinderThread, FetcherThread stopped.
> Step 3 looks very weird, checking the code, there is timestamp contains in
> the stored data, it may be caused by Zookeeper issue.
> But what I am wondering is that whether it is possible to let application
> (kafka client users) to know that the underline LeaderFinderThread and
> FetcherThread are stopped, like allowing application to register some
> callback or throws some exception (by invalidate the KafkaStream iterator for
> example)? For me, it is not reasonable for the kafka client to shutdown
> everything and wait for next rebalance, and let application wait on
> iterator.hasNext() without knowing that there is something wrong underline.
> I've read about twiki about kafka 0.9 consumer rewrite, and there is a
> ConsumerRebalanceCallback interface, but I am not sure how long it will take
> to be ready, and how long it will take for us to migrate.
> Please help to look at this issue. Thanks very much!
--
This message was sent by Atlassian JIRA
(v6.2#6252)