Improve performance of CombineFileInputFormat when multiple pools are configured --------------------------------------------------------------------------------
Key: MAPREDUCE-1423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1423 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: dhruba borthakur Assignee: dhruba borthakur I have a map-reduce job that is using CombineFileInputFormat. It has configured 10000 pools and 30000 files. The time to create the splits takes more than an hour. The reaosn being that CombineFileInputFormat.getSplits() converts the same path from String to Path object multiple times, one for each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. This code can be optimized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.