[
https://issues.apache.org/jira/browse/HADOOP-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682902#action_12682902
]
Owen O'Malley commented on HADOOP-5381:
---------------------------------------
I don't think this is a good idea.
1. The client should not be using the network topology. There is no guarantee
that it works correctly anywhere but on the job tracker and name node.
2. The locations returned by the split aren't ordered. They are treated as a
set. So sorting them is pointless.
I think the proper fix here is just to use some heuristic like keeping all of
the hosts that are within 50% of the highest total host.
> Extend HADOOP-3293 to MapReduce package also
> --------------------------------------------
>
> Key: HADOOP-5381
> URL: https://issues.apache.org/jira/browse/HADOOP-5381
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Fix For: 0.21.0
>
> Attachments: hadoop-5381.patch
>
>
> HADOOP-3293 made changes to FileInputFormat to identify split locations that
> contribute most to the split. This functionality has to be added to the
> MapReduce.FileInputFormat too.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.