[
https://issues.apache.org/jira/browse/FLINK-10518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-10518:
-----------------------------------
Labels: Source:FileSystem auto-unassigned stale-major (was:
Source:FileSystem auto-unassigned)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Inefficient design in ContinuousFileMonitoringFunction
> ------------------------------------------------------
>
> Key: FLINK-10518
> URL: https://issues.apache.org/jira/browse/FLINK-10518
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.5.2
> Reporter: Huyen Levan
> Priority: Major
> Labels: Source:FileSystem, auto-unassigned, stale-major
>
> The ContinuousFileMonitoringFunction class keeps track of the latest file
> modification time to rule out all files it has processed in the previous
> cycles. For a long-running job, the list of eligible files will be much
> smaller than the list of all files in the folder being monitored.
> In the current implementation of the getInputSplitsSortedByModTime method, a
> (big) list of all available splits are created first, and then every single
> split is checked with the list of eligible files.
> {quote}for (FileInputSplit split:
> format.createInputSplits(readerParallelism)) {
> FileStatus fileStatus = eligibleFiles.get(split.getPath());
> if (fileStatus != null) {
> {quote}
> The improvement can be done as:
> * Listing of all files should be done once in
> _ContinuousFileMonitoringFunction.listEligibleFiles()_ (as of now it is done
> the 2nd time in _FileInputFormat.createInputSplits()_ )
> * The list of file-splits should then be created from the list of paths in
> eligibleFiles.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)