Tzu-Li (Gordon) Tai created FLINK-6006:
------------------------------------------

             Summary: Kafka Consumer can lose state if queried partition list 
is incomplete on restore
                 Key: FLINK-6006
                 URL: https://issues.apache.org/jira/browse/FLINK-6006
             Project: Flink
          Issue Type: Bug
          Components: Kafka Connector, Streaming Connectors
            Reporter: Tzu-Li (Gordon) Tai
            Assignee: Tzu-Li (Gordon) Tai
            Priority: Blocker
             Fix For: 1.1.5, 1.2.1


In 1.1.x and 1.2.x, the FlinkKafkaConsumer performs partition list querying on 
restore. Then, only restored state of partitions that exists in the queried 
list is used to initialize the fetcher's state holders.

If in any case the returned partition list is incomplete (i.e. missing 
partitions that existed before, perhaps due to temporary ZK / broker downtime), 
then the state of the missing partitions is dropped and cannot be recovered 
anymore.

In 1.3-SNAPSHOT, this is fixed by changes in FLINK-4280, so only 1.1 and 1.2 is 
affected.

We can backport some of the behavioural changes there to 1.1 and 1.2. 
Generally, we should not depend on the current partition list in Kafka when 
restoring, but just restore all previous state into the fetcher's state 
holders. 

This would therefore also require some checking on how the consumer threads / 
Kafka clients behave when its assigned partitions cannot be reached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to