[ 
https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643943#action_12643943
 ] 

Jothi Padmanabhan commented on HADOOP-3293:
-------------------------------------------

bq. If we aggregate on a per host basis, host A having contributed 120 bytes 
would be the ideal choice. However, if we choose Block 1 as the index to be 
returned, even hosts B &C would be treated as data local, which is sub optimal.

To make this  clear -- Having decided that A is a good host, we now should also 
have a good way to decide to pick the correct block from the list of blocks 
that reside in A. In this case, we should choose between Block1 and Block2. If 
Block 1 is chosen, it is not very optimal as hosts B & C have only 20 bytes 
with them.


> When an input split spans cross block boundary, the split location should be 
> the host having most of bytes on it. 
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3293
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3293
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Jothi Padmanabhan
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to