[ 
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447584#comment-13447584
 ] 

Johannes Zillmann commented on MAPREDUCE-207:
---------------------------------------------

Currently in our hadoop applications we calculate the splits before we submit 
it to the client (then the client simply looks up the existing splits). We do 
that mainly to influence the reducer count base on the number of 
splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would 
be nice to have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after 
we know the splits (different join strategies for different data 
constellations).

Just dumping some ideas here...

                
> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
>                 Key: MAPREDUCE-207
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Philip Zeyliger
>            Assignee: Arun C Murthy
>         Attachments: MAPREDUCE-207.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could 
> have a separate "job task type" that computes the input splits, therefore 
> allowing that computation to happen on the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to