[ https://issues.apache.org/jira/browse/HADOOP-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491779 ]
Owen O'Malley commented on HADOOP-1296: --------------------------------------- > Another question, does getFileHints still need the parameters "start" and > "len"? I think it is good to allow the user to ask about a subrange of a file, if that is all they are interested in and doesn't cost much complexity. > Improve interface to FileSystem.getFileCacheHints > ------------------------------------------------- > > Key: HADOOP-1296 > URL: https://issues.apache.org/jira/browse/HADOOP-1296 > Project: Hadoop > Issue Type: Improvement > Components: fs > Reporter: Owen O'Malley > Assigned To: dhruba borthakur > > The FileSystem interface provides a very limited interface for finding the > location of the data. The current method looks like: > String[][] getFileCacheHints(Path file, long start, long len) throws > IOException > which returns a list of "block info" where the block info consists of a list > host names. Because the hints don't include the information about where the > block boundaries are, map/reduce is required to call the name node for each > split. I'd propose that we fix the naming a bit and make it: > public class BlockInfo extends Writable { > public long getStart(); > public String[] getHosts(); > } > BlockInfo[] getFileHints(Path file, long start, long len) throws IOException; > So that map/reduce can query about the entire file and get the locations in a > single call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.