[
https://issues.apache.org/jira/browse/HADOOP-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12683922#action_12683922
]
Owen O'Malley commented on HADOOP-5381:
---------------------------------------
1. It should just use a map from hostname String to LongWritable to keep track
of the lengths on each node. That will be much clearer. It should not be
building topology information here. There is a lot of totally unmotivated code
in this patch.
2. It is not at all clear that picking the top N locations is right, where N is
the replication factor. I think a heuristic that says include the top node and
any node within 50% of its datasize would be more appropriate.
> Extend HADOOP-3293 to MapReduce package also
> --------------------------------------------
>
> Key: HADOOP-5381
> URL: https://issues.apache.org/jira/browse/HADOOP-5381
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Fix For: 0.21.0
>
> Attachments: hadoop-5381.patch
>
>
> HADOOP-3293 made changes to FileInputFormat to identify split locations that
> contribute most to the split. This functionality has to be added to the
> MapReduce.FileInputFormat too.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.