[ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643943#action_12643943 ]
Jothi Padmanabhan commented on HADOOP-3293: ------------------------------------------- bq. If we aggregate on a per host basis, host A having contributed 120 bytes would be the ideal choice. However, if we choose Block 1 as the index to be returned, even hosts B &C would be treated as data local, which is sub optimal. To make this clear -- Having decided that A is a good host, we now should also have a good way to decide to pick the correct block from the list of blocks that reside in A. In this case, we should choose between Block1 and Block2. If Block 1 is chosen, it is not very optimal as hosts B & C have only 20 bytes with them. > When an input split spans cross block boundary, the split location should be > the host having most of bytes on it. > ------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-3293 > URL: https://issues.apache.org/jira/browse/HADOOP-3293 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Jothi Padmanabhan > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.