[ https://issues.apache.org/jira/browse/HADOOP-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491776 ]
Owen O'Malley commented on HADOOP-1296: --------------------------------------- Maybe I was just trying to be too clever. :) Since the BlockInfo's come in a list of presumably sequential blocks, the start of each block lets you determine how long the previous block is except the last. Even the last is bounded by the length that the user passed in. It would work equally well to provide the lengths instead of the sizes as long as the first block in the returned list was only reported as a length from the given offset. To me the two solutions seemed equivalent, except that the starts didn't need to be adjusted based on the user input and the lengths did. What do other people think? > Improve interface to FileSystem.getFileCacheHints > ------------------------------------------------- > > Key: HADOOP-1296 > URL: https://issues.apache.org/jira/browse/HADOOP-1296 > Project: Hadoop > Issue Type: Improvement > Components: fs > Reporter: Owen O'Malley > Assigned To: dhruba borthakur > > The FileSystem interface provides a very limited interface for finding the > location of the data. The current method looks like: > String[][] getFileCacheHints(Path file, long start, long len) throws > IOException > which returns a list of "block info" where the block info consists of a list > host names. Because the hints don't include the information about where the > block boundaries are, map/reduce is required to call the name node for each > split. I'd propose that we fix the naming a bit and make it: > public class BlockInfo extends Writable { > public long getStart(); > public String[] getHosts(); > } > BlockInfo[] getFileHints(Path file, long start, long len) throws IOException; > So that map/reduce can query about the entire file and get the locations in a > single call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.