[ https://issues.apache.org/jira/browse/MAPREDUCE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hemanth Yamijala updated MAPREDUCE-1466: ---------------------------------------- Attachment: MAPREDUCE-1466_yhadoop20-1.patch Minor changes to the earlier patch in the newly attached one: - Removed a System.err println in the old FileInputFormat. Please note that the same data (about number of paths to process) is available via a log statement in getSplits as well. - Removed a duplicate call to listStatus in the new FileInputFormat, which was like this: {code} + List<FileStatus>files = listStatus(job); for (FileStatus file: listStatus(job)) { {code} I also suppose we need testcases for the new API. However, there are no tests for any of the classes in the org.apache.hadoop.mapreduce.lib.input package. So possibly this should be a separate JIRA. Please let me know if the changes seem fine. > FileInputFormat should save #input-files in JobConf > --------------------------------------------------- > > Key: MAPREDUCE-1466 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1466 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client > Reporter: Arun C Murthy > Assignee: Arun C Murthy > Priority: Minor > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1466_yhadoop20-1.patch, > MAPREDUCE-1466_yhadoop20.patch > > > We already track the amount of data consumed by MR applications > (MAP_INPUT_BYTES), alongwith, it would be useful to #input-files from the > client-side for analysis. Along the lines of MAPREDUCE-1403, it would be easy > to stick in the JobConf during job-submission. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.