[ https://issues.apache.org/jira/browse/HADOOP-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643989#action_12643989 ]
Jothi Padmanabhan commented on HADOOP-3293: ------------------------------------------- Since the BlkIndex is used only to identify the hosts, {code} int blkIndex = getBlockIndex(blkLocations, length-bytesRemaining, splitSize); splits.add(new FileSplit(path, length-bytesRemaining, splitSize, blkLocations[blkIndex].getHosts())); {code} we could also modify getBlockIndex() to return a list of hosts that contain the maximum data for that split. For example, if the split was Block1 80Bytes Hosts-A,B,C Block2 100Bytes Hosts A,D,E Block 3 70Bytes Hosts D,F,B We would identify the hosts and their contribution as A 180 B 150 C 80 D 170 E 100 F 70 We could return A,D,B > When an input split spans cross block boundary, the split location should be > the host having most of bytes on it. > ------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-3293 > URL: https://issues.apache.org/jira/browse/HADOOP-3293 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assignee: Jothi Padmanabhan > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.