[
https://issues.apache.org/jira/browse/FLINK-31192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692510#comment-17692510
]
Weijie Guo commented on FLINK-31192:
------------------------------------
[~xzw0223] Thanks for reporting this, does this problem only exist in 1.16? If
not, you'd better adjust the affected version.
> dataGen takes too long to initialize under sequence
> ---------------------------------------------------
>
> Key: FLINK-31192
> URL: https://issues.apache.org/jira/browse/FLINK-31192
> Project: Flink
> Issue Type: Improvement
> Affects Versions: 1.16.0, 1.16.1
> Reporter: xzw0223
> Priority: Major
> Fix For: 1.16.0, 1.16.1
>
>
> The SequenceGenerator preloads all sequence values in open. If the
> totalElement number is too large, it will take too long.
> [https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/datagen/SequenceGenerator.java#L91]
> The reason is that the capacity of the Deque will be expanded twice when the
> current capacity is full, and the array copy is required, which is
> time-consuming.
>
> Here's what I think :
> do not preload the full amount of data on Sequence, and generate a piece of
> data each time next is called to solve the problem of slow initialization
> caused by loading full amount of data.
> record the currently sent Sequence position through the checkpoint, and
> continue to send data through the recorded position after an abnormal restart
> to ensure fault tolerance
--
This message was sent by Atlassian Jira
(v8.20.10#820010)