This seems to be the case. I don't think there is any specific reason not to read across the block boundary...
Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of the readFully() variants. For e.g. if you later change your code by wrapping a BufferedInputStream around 'in', you would still get partial reads even if HDFS reads all the data.
Raghu. forbbs forbbs wrote:
The hadoop version is 0.19.0. My file is larger than 64MB, and the block size is 64MB. The output of the code below is '10'. May I read across the block boundary? Or I should use 'while (left..){}' style code? public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FSDataInputStream fin = fs.open(new Path(args[0])); fin.seek(64*1024*1024 - 10); byte[] buffer = new byte[32*1024]; int len = fin.read(buffer); //int len = fin.read(buffer, 0, 128); System.out.println(len); fin.close(); }