[
https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15654940#comment-15654940
]
ASF GitHub Bot commented on APEXMALHAR-2274:
--------------------------------------------
GitHub user mattqzhang opened a pull request:
https://github.com/apache/apex-malhar/pull/490
APEXMALHAR-2274 Handle large number of files for AbstractFileInputOpe…
@PramodSSImmaneni Please review
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mattqzhang/apex-malhar APEXMALHAR-2274
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/apex-malhar/pull/490.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #490
----
commit cb37761b62761d0e74f1c9b8c7c47491735e5421
Author: Matt Zhang <[email protected]>
Date: 2016-11-10T19:38:51Z
APEXMALHAR-2274 Handle large number of files for AbstractFileInputOperator
----
> AbstractFileInputOperator gets killed when there are a large number of files.
> -----------------------------------------------------------------------------
>
> Key: APEXMALHAR-2274
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2274
> Project: Apache Apex Malhar
> Issue Type: Bug
> Reporter: Munagala V. Ramanath
> Assignee: Matt Zhang
>
> When there are a large number of files in the monitored directory, the call
> to DirectoryScanner.scan() can take a long time since it calls
> FileSystem.listStatus() which returns the entire list. Meanwhile, the
> AppMaster deems this operator hung and restarts it which again results in the
> same problem.
> It should use FileSystem.listStatusIterator() [in Hadoop 2.7.X] or
> FileSystem.listFiles() [in 2.6.X] or other similar calls that return
> a remote iterator to limit the number files processed in a single call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)