[ 
https://issues.apache.org/jira/browse/MAPREDUCE-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800644#comment-13800644
 ] 

Ben Podgursky commented on MAPREDUCE-199:
-----------------------------------------

Hey Harsh.  Delay was because I've only worked with MR1 so far (cloudera hadoop 
4) and all of my source suggestions were in the context of MR1, so I spent a 
bit of time checking out what in the source changed between MR1 and MR2.   

After looking around your patch seems like a pretty nice way of enabling this 
functionality without baking anything else into the API or complicating the 
code (since it bootstraps on locality logic which already exists.)  

The other alternative I was thinking about was making the logic pluggable via 
the JobConf, similar to how partitions are set, eg

conf.setReduceTaskLocalizer(MyLocalityLogic.class);

Where MyLocalityLogic would have logic for assigning task -> host.  I'm not 
really sure how it would work though since (1) I'm not sure whether user-code 
is on the classpath at the time tasks are assigned to nodes and (2) the 
locality logic would need to be presented with a whole network topology to be 
able to do anything intelligent, and I'm not sure where that would come from...

> Locality hints for Reduce
> -------------------------
>
>                 Key: MAPREDUCE-199
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-199
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: applicationmaster, mrv2
>            Reporter: Benjamin Reed
>            Assignee: Harsh J
>         Attachments: MAPREDUCE-199.patch, MAPREDUCE-199.patch
>
>
> It would be nice if we could add method to OutputFormat that would allow a 
> job to indicate where a reducer for a given partition should should run. This 
> is similar to the getSplits() method on InputFormat. In our application the 
> reducer is using other data in addition to the map outputs during processing 
> and data accesses could be made more efficient if the JobTracker scheduled 
> the reducers to run on specific hosts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to