Improve performance of CombineFileInputFormat when multiple pools are configured
--------------------------------------------------------------------------------

                 Key: MAPREDUCE-1423
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1423
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: client
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


I have a map-reduce job that is using CombineFileInputFormat. It has configured 
10000 pools and 30000 files. The time to create the splits takes more than an 
hour. The reaosn being that CombineFileInputFormat.getSplits() converts the 
same path from String to Path object multiple times, one for each instance of a 
pool. Similarly, it calls Path.toUri(0 multiple times. This code can be 
optimized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to