xzw0223 created FLINK-31192:
-------------------------------

             Summary: dataGen takes too long to initialize under sequence
                 Key: FLINK-31192
                 URL: https://issues.apache.org/jira/browse/FLINK-31192
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.16.1, 1.16.0
            Reporter: xzw0223
             Fix For: 1.16.1, 1.16.0


The SequenceGenerator preloads all sequence values in open. If the totalElement 
number is too large, it will take too long.
[https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91]



The reason is that the capacity of the Deque will be expanded twice when the 
current capacity is full, and the array copy is required, which is 
time-consuming.

 

Here's what I think : 
 do not preload the full amount of data on Sequence, and generate a piece of 
data each time next is called to solve the problem of slow initialization 
caused by loading full amount of data.

  record the currently sent Sequence position through the checkpoint, and 
continue to send data through the recorded position after an abnormal restart 
to ensure fault tolerance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to