[ https://issues.apache.org/jira/browse/HADOOP-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492742 ]
dhruba borthakur commented on HADOOP-1296: ------------------------------------------ In the current incantation, getHosts() returns the name of the host on which the block resides. I would like to request that getPorts() be added to this API. This is needed to write *deterministic* test case (e.g. TestDecommission) using DFSMiniCluster where we have multiple datanodes on the same host. > Improve interface to FileSystem.getFileCacheHints > ------------------------------------------------- > > Key: HADOOP-1296 > URL: https://issues.apache.org/jira/browse/HADOOP-1296 > Project: Hadoop > Issue Type: Improvement > Components: fs > Reporter: Owen O'Malley > Assigned To: dhruba borthakur > > The FileSystem interface provides a very limited interface for finding the > location of the data. The current method looks like: > String[][] getFileCacheHints(Path file, long start, long len) throws > IOException > which returns a list of "block info" where the block info consists of a list > host names. Because the hints don't include the information about where the > block boundaries are, map/reduce is required to call the name node for each > split. I'd propose that we fix the naming a bit and make it: > public class BlockInfo extends Writable { > public long getStart(); > public String[] getHosts(); > } > BlockInfo[] getFileHints(Path file, long start, long len) throws IOException; > So that map/reduce can query about the entire file and get the locations in a > single call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.