[ 
https://issues.apache.org/jira/browse/HADOOP-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644908#action_12644908
 ] 

Jothi Padmanabhan commented on HADOOP-4567:
-------------------------------------------

Sorry, let me try to explain it better. Let us assume that the replication 
factor is 3 and we have one block. GetFileBlockLocation would return in 
BlockLocations[0].hosts {h1, h2, h3} and in BlockLocations[0].racks {r1,r1,r2} 
where h1 and h2 are in r1 and h3 is in r2. The topologyPaths for the different 
hosts are determined by their relative positions in their arrays. The 
topologyPath for h1, which is in index 0 of hosts array, is r1 (index 0 in the 
racks array). An alternative could have just been to return /r1/h1, /r1/h2 and 
/r2/h3 in the hosts. That way, if somebody wants both the host and rack 
information, they do not need to construct it by reading the same index in two 
arrays. Makes sense? 

> GetFileBlockLocations should return the NetworkTopology information of the 
> machines that hosts those blocks
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4567
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4567
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: dfsRackLocation.patch
>
>
> MultiFileInputFormat and FileInputFormat should use block locality 
> information to construct splits. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to