[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1466:
----------------------------------------

    Attachment: MAPREDUCE-1466_yhadoop20-1.patch

Minor changes to the earlier patch in the newly attached one:

- Removed a System.err println in the old FileInputFormat. Please note that the 
same data (about number of paths to process) is available via a log statement 
in getSplits as well.
- Removed a duplicate call to listStatus in the new FileInputFormat, which was 
like this:
{code}
+    List<FileStatus>files = listStatus(job);
     for (FileStatus file: listStatus(job)) {
{code}

I also suppose we need testcases for the new API. However, there are no tests 
for any of the classes in the org.apache.hadoop.mapreduce.lib.input package. So 
possibly this should be a separate JIRA.

Please let me know if the changes seem fine.

> FileInputFormat should save #input-files in JobConf
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-1466
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1466_yhadoop20-1.patch, 
> MAPREDUCE-1466_yhadoop20.patch
>
>
> We already track the amount of data consumed by MR applications 
> (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the 
> client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy 
> to stick in the JobConf during job-submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to