Jun Qin created FLINK-26018:
-------------------------------

             Summary: Late events in the new KafkaSource
                 Key: FLINK-26018
                 URL: https://issues.apache.org/jira/browse/FLINK-26018
             Project: Flink
          Issue Type: Bug
            Reporter: Jun Qin
         Attachments: message in kafka.txt, 
taskmanager_10.28.0.131_33249-b3370c_log

There is an issue with the new KafkaSource connector in Flink 1.14: when one 
task consumes messages from multiple topic partitions (statically created, 
timestamp are in order), it may start with one partition and advances 
watermarks before the data from other partitions come. In this case, the early 
messages in other partitions may unnecessarily be considered  as late ones.

I discussed with [~renqs], it seems that the new KafkaSource only adds a 
partition into {{WatermarkMultiplexer}} when it receives data from that 
partition. In contrast, FlinkKafkaConsumer adds all known partition before it 
fetch any data. 

Attached two files: the messages in Kafka and the corresponding TM logs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to