[
https://issues.apache.org/jira/browse/FLINK-31285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695411#comment-17695411
]
Jark Wu commented on FLINK-31285:
---------------------------------
I think extending FlinkSourceBuilder to accept user-defined FileSplitAssigner
is totally reasonable. Do you have any ideas on how to sort the files?
> FileSource should support reading files in order
> ------------------------------------------------
>
> Key: FLINK-31285
> URL: https://issues.apache.org/jira/browse/FLINK-31285
> Project: Flink
> Issue Type: New Feature
> Components: Connectors / FileSystem
> Affects Versions: 1.18.0
> Reporter: Yaroslav Tkachenko
> Priority: Major
>
> Currently, Flink's *FileSource* uses *LocalityAwareSplitAssigner* as a
> default *FileSplitAssigner* and it doesn't guarantee any order. In many
> scenarios involving processing historical data, reading files in order can be
> a requirement, especially when using event-time processing.
> I believe a new FileSplitAssigner should be implemented that supports
> ordering. FileSourceBuilder should be extended to allow choosing a different
> FileSplitAssigner.
> It's also clear that the files may not be read in _perfect_ order with
> parallelism > 1. However, in some cases, using parallelism of 1 might be fine.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)