[ 
https://issues.apache.org/jira/browse/FLINK-31285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695411#comment-17695411
 ] 

Jark Wu commented on FLINK-31285:
---------------------------------

I think extending FlinkSourceBuilder to accept user-defined FileSplitAssigner 
is totally reasonable. Do you have any ideas on how to sort the files? 

> FileSource should support reading files in order
> ------------------------------------------------
>
>                 Key: FLINK-31285
>                 URL: https://issues.apache.org/jira/browse/FLINK-31285
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / FileSystem
>    Affects Versions: 1.18.0
>            Reporter: Yaroslav Tkachenko
>            Priority: Major
>
> Currently, Flink's *FileSource* uses *LocalityAwareSplitAssigner* as a 
> default *FileSplitAssigner* and it doesn't guarantee any order. In many 
> scenarios involving processing historical data, reading files in order can be 
> a requirement, especially when using event-time processing. 
> I believe a new FileSplitAssigner should be implemented that supports 
> ordering. FileSourceBuilder should be extended to allow choosing a different 
> FileSplitAssigner.
> It's also clear that the files may not be read in _perfect_ order with 
> parallelism > 1. However, in some cases, using parallelism of 1 might be fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to