[ 
https://issues.apache.org/jira/browse/HADOOP-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492742
 ] 

dhruba borthakur commented on HADOOP-1296:
------------------------------------------

In the current incantation, getHosts() returns the name of the host on which 
the block resides. I would like to request that getPorts() be added to this 
API. This is needed to write *deterministic* test case (e.g. TestDecommission) 
using DFSMiniCluster where we have multiple datanodes on the same host. 

> Improve interface to FileSystem.getFileCacheHints
> -------------------------------------------------
>
>                 Key: HADOOP-1296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1296
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Owen O'Malley
>         Assigned To: dhruba borthakur
>
> The FileSystem interface provides a very limited interface for finding the 
> location of the data. The current method looks like:
> String[][] getFileCacheHints(Path file, long start, long len) throws 
> IOException
> which returns a list of "block info" where the block info consists of a list 
> host names. Because the hints don't include the information about where the 
> block boundaries are, map/reduce is required to call the name node for each 
> split. I'd propose that we fix the naming a bit and make it:
> public class BlockInfo extends Writable {
>   public long getStart();
>   public String[] getHosts();
> }
> BlockInfo[] getFileHints(Path file, long start, long len) throws IOException;
> So that map/reduce can query about the entire file and get the locations in a 
> single call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to