ming li created FLINK-31008:
-------------------------------
Summary: [Flink][Table Store] The Split allocation of the same
bucket in ContinuousFileSplitEnumerator may be out of order
Key: FLINK-31008
URL: https://issues.apache.org/jira/browse/FLINK-31008
Project: Flink
Issue Type: Bug
Components: Table Store
Reporter: ming li
There are two places in {{ContinuousFileSplitEnumerator}} that add
{{FileStoreSourceSplit}} to {{{}bucketSplits{}}}: {{addSplitsBack}} and
{{{}processDiscoveredSplits{}}}. {{processDiscoveredSplits}} will continuously
check for new splits and add them to the queue. At this time, the order of the
splits is in order.
{code:java}
private void addSplits(Collection<FileStoreSourceSplit> splits) {
splits.forEach(this::addSplit);
}
private void addSplit(FileStoreSourceSplit split) {
bucketSplits
.computeIfAbsent(((DataSplit) split.split()).bucket(), i -> new
LinkedList<>())
.add(split);
}{code}
However, when the task failover, the splits that have been allocated before
will be returned. At this time, these returned splits are also added to the end
of the queue, which leads to disorder in the allocation of splits.
I think these returned splits should be added to the head of the queue to
ensure the order of allocation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)