[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated MAPREDUCE-1981:
-------------------------------------

    Attachment: mapredListFiles.patch

This patch makes FileInputFormat & CombineFileInputFormat to use the new 
listFiles() API introduced by HADOOP-6870.

Ideally FileInputFormat#listStatus should have the following syntax:
{code}
Iterator<LocatedFileStatus> listStatus(JobConf job) throws IOException;
{code}

But since this is a public interface, I keep it as it is now to keep it 
backward compatible.

This patch also changes the semantics of listStatus a little bit. When 
recursive is false, listStatus used to return every child (including 
subdirectories) of the input directories. But with the new API, it returns only 
the file children. So it is not able to support this case: throwing an 
exception when an input directory has a subdirectory but recursive is false. I 
removed this test case from TestFileInputFormat. If we really want to support 
this scenario, I could make FileContext#listFiles to throw an exception when 
recursive is false but there is sbudirectory.

> Improve getSplits performance by using listFiles, the new FileSystem API
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1981
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>         Attachments: mapredListFiles.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new 
> API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to