[ https://issues.apache.org/jira/browse/APEXMALHAR-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647751#comment-15647751 ]
Munagala V. Ramanath commented on APEXMALHAR-2274: -------------------------------------------------- The benefit of such an interface is not clear at this point; perhaps when the need for polymorphic use of such an interface arises, we can refactor. > AbstractFileInputOperator gets killed when there are a large number of files. > ----------------------------------------------------------------------------- > > Key: APEXMALHAR-2274 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2274 > Project: Apache Apex Malhar > Issue Type: Bug > Reporter: Munagala V. Ramanath > Assignee: Matt Zhang > > When there are a large number of files in the monitored directory, the call > to DirectoryScanner.scan() can take a long time since it calls > FileSystem.listStatus() which returns the entire list. Meanwhile, the > AppMaster deems this operator hung and restarts it which again results in the > same problem. > It should use FileSystem.listStatusIterator() [in Hadoop 2.7.X] or > FileSystem.listFiles() [in 2.6.X] or other similar calls that return > a remote iterator to limit the number files processed in a single call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)