[ https://issues.apache.org/jira/browse/FLINK-11792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882771#comment-16882771 ]
Konstantin Knauf commented on FLINK-11792: ------------------------------------------ [~becket_qin] Thanks for having a look at this. https://kafka.apache.org/protocol.html#protocol_partitioning describes, that the client needs to manually check for metadata updates in case of a broker failure (all requests for a partition go to the leader). `KafkaConsumer:assign`, which we use, also states that " As such, there will be no rebalance operation triggered when group membership or cluster and topic metadata change." We update the metadata only in the `KafkaPartitionDiscoverer`, which is invoked periodically. I will try to find the actual stacktraces from back then. > Make KafkaConsumer more resilient to Kafka Broker Failures > ----------------------------------------------------------- > > Key: FLINK-11792 > URL: https://issues.apache.org/jira/browse/FLINK-11792 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kafka > Affects Versions: 1.7.2 > Reporter: Konstantin Knauf > Priority: Major > > When consuming from a topic with replication factor > 1, the > FlinkKafkaConsumer could continue reading from this topic, when a single > broker fails, by "simply" switching to the new leader `s for all lost > partitions after Kafka failover. Currently, the KafkaConsumer will most > likely throw in exception as topic metadata is only periodically fetched from > the Kafka cluster. > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)