[ https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855098#comment-15855098 ]
Jason Gustafson commented on KAFKA-4739: ---------------------------------------- [~sagar8192] Unfortunately, there is no such option. Traditionally, kafka clients attempt to handle broker failures internally. This usually means a metadata refresh and a reconnect, which is exactly what the client appears to be doing here. We normally expect that the assigned partitions are spread across multiple brokers, so a failure fetching from any particular broker should only affect the availability of the partitions it was hosting. This is typically what you want since a broker failure will cause another broker to take over its partitions. There is little applications can do in these cases anyway other than possibly sending an alert. Nevertheless, this behavior is often contested and may change, especially as some of the automatic behavior (such as topic auto-creation) is retired. One small request: the logs seem to have sanitized broker ids. Can you ensure that they have all been updated consistently? The puzzling thing is that the the requests appear to be timing out on the client after 30s, yet you've enabled 120s in the config. Are you sure the 120s is correct? In which config did you enable "request_timeout_ms = 300001" (the broker doesn't have such a config)? It's also strange that multiple fetches are cancelled after a disconnect. The consumer should only ever have one fetch in-flight for each broker. I don't have a ready explanation for that. Could there be some details left out of the logs? We might get more information if you enable TRACE level logging. > KafkaConsumer poll going into an infinite loop > ---------------------------------------------- > > Key: KAFKA-4739 > URL: https://issues.apache.org/jira/browse/KAFKA-4739 > Project: Kafka > Issue Type: Bug > Components: consumer > Affects Versions: 0.9.0.1 > Reporter: Vipul Singh > > We are seeing an issue with our kafka consumer where it seems to go into an > infinite loop while polling, trying to fetch data from kafka. We are seeing > the heartbeat requests on the broker from the consumer, but nothing else from > the kafka consumer. > We enabled debug level logging on the consumer, and see these logs: > https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4 > And this just goes on. The way we have been able to replicate this issue, is > by restarting the process in multiple successions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)