[
https://issues.apache.org/jira/browse/FLINK-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323456#comment-17323456
]
Flink Jira Bot commented on FLINK-10518:
----------------------------------------
This issue is assigned but has not received an update in 7 days so it has been
labeled "stale-assigned". If you are still working on the issue, please give an
update and remove the label. If you are no longer working on the issue, please
unassign so someone else may work on it. In 7 days the issue will be
automatically unassigned.
> Inefficient design in ContinuousFileMonitoringFunction
> ------------------------------------------------------
>
> Key: FLINK-10518
> URL: https://issues.apache.org/jira/browse/FLINK-10518
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.5.2
> Reporter: Huyen Levan
> Assignee: Guibo Pan
> Priority: Major
> Labels: Source:FileSystem, stale-assigned
>
> The ContinuousFileMonitoringFunction class keeps track of the latest file
> modification time to rule out all files it has processed in the previous
> cycles. For a long-running job, the list of eligible files will be much
> smaller than the list of all files in the folder being monitored.
> In the current implementation of the getInputSplitsSortedByModTime method, a
> (big) list of all available splits are created first, and then every single
> split is checked with the list of eligible files.
> {quote}for (FileInputSplit split:
> format.createInputSplits(readerParallelism)) {
> FileStatus fileStatus = eligibleFiles.get(split.getPath());
> if (fileStatus != null) {
> {quote}
> The improvement can be done as:
> * Listing of all files should be done once in
> _ContinuousFileMonitoringFunction.listEligibleFiles()_ (as of now it is done
> the 2nd time in _FileInputFormat.createInputSplits()_ )
> * The list of file-splits should then be created from the list of paths in
> eligibleFiles.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)