[ https://issues.apache.org/jira/browse/MAPREDUCE-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840537#action_12840537 ]
Hudson commented on MAPREDUCE-1423: ----------------------------------- Integrated in Hadoop-Mapreduce-trunk-Commit #256 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/256/]) . Improve performance of CombineFileInputFormat when multiple pools are configured. (Dhruba Borthakur via zshao) > Improve performance of CombineFileInputFormat when multiple pools are > configured > -------------------------------------------------------------------------------- > > Key: MAPREDUCE-1423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1423 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.22.0 > > Attachments: CombineFileInputFormatPerformance.txt, > CombineFileInputFormatPerformance.txt > > > I have a map-reduce job that is using CombineFileInputFormat. It has > configured 10000 pools and 30000 files. The time to create the splits takes > more than an hour. The reaosn being that CombineFileInputFormat.getSplits() > converts the same path from String to Path object multiple times, one for > each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. > This code can be optimized. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.