[ https://issues.apache.org/jira/browse/HADOOP-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644908#action_12644908 ]
Jothi Padmanabhan commented on HADOOP-4567: ------------------------------------------- Sorry, let me try to explain it better. Let us assume that the replication factor is 3 and we have one block. GetFileBlockLocation would return in BlockLocations[0].hosts {h1, h2, h3} and in BlockLocations[0].racks {r1,r1,r2} where h1 and h2 are in r1 and h3 is in r2. The topologyPaths for the different hosts are determined by their relative positions in their arrays. The topologyPath for h1, which is in index 0 of hosts array, is r1 (index 0 in the racks array). An alternative could have just been to return /r1/h1, /r1/h2 and /r2/h3 in the hosts. That way, if somebody wants both the host and rack information, they do not need to construct it by reading the same index in two arrays. Makes sense? > GetFileBlockLocations should return the NetworkTopology information of the > machines that hosts those blocks > ----------------------------------------------------------------------------------------------------------- > > Key: HADOOP-4567 > URL: https://issues.apache.org/jira/browse/HADOOP-4567 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: dfsRackLocation.patch > > > MultiFileInputFormat and FileInputFormat should use block locality > information to construct splits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.