[
https://issues.apache.org/jira/browse/MAPREDUCE-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447584#comment-13447584
]
Johannes Zillmann commented on MAPREDUCE-207:
---------------------------------------------
Currently in our hadoop applications we calculate the splits before we submit
it to the client (then the client simply looks up the existing splits). We do
that mainly to influence the reducer count base on the number of
splits/map-tasks.
In case hadoop does the splitting on the cluster (which makes sense), it would
be nice to have a hook to influence configuration!
Sometimes it also makes sense for us to decide on the map-reduce assembly after
we know the splits (different join strategies for different data
constellations).
Just dumping some ideas here...
> Computing Input Splits on the MR Cluster
> ----------------------------------------
>
> Key: MAPREDUCE-207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-207
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Components: applicationmaster, mrv2
> Reporter: Philip Zeyliger
> Assignee: Arun C Murthy
> Attachments: MAPREDUCE-207.patch
>
>
> Instead of computing the input splits as part of job submission, Hadoop could
> have a separate "job task type" that computes the input splits, therefore
> allowing that computation to happen on the cluster.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira