[
https://issues.apache.org/jira/browse/FLINK-31008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687046#comment-17687046
]
ming li commented on FLINK-31008:
---------------------------------
[~lzljs3620320] I have created a pull request, please review it if you have
time. Thanks.
> [Flink][Table Store] The Split allocation of the same bucket in
> ContinuousFileSplitEnumerator may be out of order
> -----------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-31008
> URL: https://issues.apache.org/jira/browse/FLINK-31008
> Project: Flink
> Issue Type: Bug
> Components: Table Store
> Reporter: ming li
> Assignee: ming li
> Priority: Major
> Labels: pull-request-available
>
> There are two places in {{ContinuousFileSplitEnumerator}} that add
> {{FileStoreSourceSplit}} to {{{}bucketSplits{}}}: {{addSplitsBack}} and
> {{{}processDiscoveredSplits{}}}. {{processDiscoveredSplits}} will
> continuously check for new splits and add them to the queue. At this time,
> the order of the splits is in order.
> {code:java}
> private void addSplits(Collection<FileStoreSourceSplit> splits) {
> splits.forEach(this::addSplit);
> }
> private void addSplit(FileStoreSourceSplit split) {
> bucketSplits
> .computeIfAbsent(((DataSplit) split.split()).bucket(), i -> new
> LinkedList<>())
> .add(split);
> }{code}
> However, when the task failover, the splits that have been allocated before
> will be returned. At this time, these returned splits are also added to the
> end of the queue, which leads to disorder in the allocation of splits.
>
> I think these returned splits should be added to the head of the queue to
> ensure the order of allocation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)