Hi all, I sent an email to user@ but no one there was able to answer my question. I hope you don't mind me emailing hdfs-dev@ about it.
I'm submitting a proposal to Google Summer of Code to add support for HDFS to Disco, an Erlang MapReduce system. We're looking at using WebHDFS. As with Hadoop, we need information about the locality of the file blocks so that we can schedule tasks accordingly. WebHDFS does seem to provide some information about data locality. When you make a request for a file to the namenode, you are redirected to the datanode containing the first block of that file. 1) But what happens if you specify an offset in the third block? Are you redirected to the datanode containing that block or are you still redirected to the datanode containing the file's first block? 2) Is there any reason that WebHDFS does not support requesting the block locations? 3) Would the HDFS community be interested in a patch that adds support for a) reporting block locations and b) enables requesting blocks from the appropriate data nodes (if it is not already there)? I believe this would be of interest to other projects that are using WebHDFS. Thank you! RJ -- em rnowl...@gmail.com c 954.496.2314