[ https://issues.apache.org/jira/browse/HADOOP-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644897#action_12644897 ]
Jothi Padmanabhan commented on HADOOP-4567: ------------------------------------------- This looks good. The only debatable point is whether to introduce a new 'racks' variable in BlockLocations or to just prefix the network topology in the hosts variable itself. Having a separate racks variable implies that there is an implicit requirement that the the ordering of hosts and the racks are identical (which is true in this case). However, having a separate racks variable does makes it easier to handle cases where there is no topology information available, just a simple check on the racks variable would do instead of adding logic during the parsing of host names. I am fine with either approach. > GetFileBlockLocations should return the NetworkTopology information of the > machines that hosts those blocks > ----------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4567 > URL: https://issues.apache.org/jira/browse/HADOOP-4567 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: dfsRackLocation.patch > > > MultiFileInputFormat and FileInputFormat should use block locality > information to construct splits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.