[
https://issues.apache.org/jira/browse/FLINK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938390#comment-15938390
]
Tzu-Li (Gordon) Tai commented on FLINK-6006:
--------------------------------------------
[~gyfora] that's very odd then .. I did a double check on the code, but
currently couldn't really pinpoint what else could have gone wrong.
Could you perhaps post a relevant part of the log? That might be helpful.
Ideally, that would be the parts where the consumer was started for the first
time and initially picks up partitions, and also after restore. You can also DM
me ([email protected]) if you prefer to share it privately.
I'll try to make sure if there's any more issues on this before 1.2.1.
> Kafka Consumer can lose state if queried partition list is incomplete on
> restore
> --------------------------------------------------------------------------------
>
> Key: FLINK-6006
> URL: https://issues.apache.org/jira/browse/FLINK-6006
> Project: Flink
> Issue Type: Bug
> Components: Kafka Connector, Streaming Connectors
> Reporter: Tzu-Li (Gordon) Tai
> Assignee: Tzu-Li (Gordon) Tai
> Priority: Blocker
> Fix For: 1.1.5, 1.2.1
>
>
> In 1.1.x and 1.2.x, the FlinkKafkaConsumer performs partition list querying
> on restore. Then, only restored state of partitions that exists in the
> queried list is used to initialize the fetcher's state holders.
> If in any case the returned partition list is incomplete (i.e. missing
> partitions that existed before, perhaps due to temporary ZK / broker
> downtime), then the state of the missing partitions is dropped and cannot be
> recovered anymore.
> In 1.3-SNAPSHOT, this is fixed by changes in FLINK-4280, so only 1.1 and 1.2
> is affected.
> We can backport some of the behavioural changes there to 1.1 and 1.2.
> Generally, we should not depend on the current partition list in Kafka when
> restoring, but just restore all previous state into the fetcher's state
> holders.
> This would therefore also require some checking on how the consumer threads /
> Kafka clients behave when its assigned partitions cannot be reached.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)