[jira] Commented: (MAPREDUCE-1423) Improve performance of CombineFileInputFormat when multiple pools are configured

Hudson (JIRA) Wed, 03 Mar 2010 00:51:53 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840537#action_12840537
 ]


Hudson commented on MAPREDUCE-1423:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #256 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/256/])
    . Improve performance of CombineFileInputFormat when multiple pools are 
configured. (Dhruba Borthakur via zshao)


> Improve performance of CombineFileInputFormat when multiple pools are 
> configured
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1423
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1423
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.22.0
>
>         Attachments: CombineFileInputFormatPerformance.txt, 
> CombineFileInputFormatPerformance.txt
>
>
> I have a map-reduce job that is using CombineFileInputFormat. It has 
> configured 10000 pools and 30000 files. The time to create the splits takes 
> more than an hour. The reaosn being that CombineFileInputFormat.getSplits() 
> converts the same path from String to Path object multiple times, one for 
> each instance of a pool. Similarly, it calls Path.toUri(0 multiple times. 
> This code can be optimized.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1423) Improve performance of CombineFileInputFormat when multiple pools are configured

Reply via email to