This seems to be the case. I don't think there is any specific reason not to read across the block boundary...

Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of the readFully() variants. For e.g. if you later change your code by wrapping a BufferedInputStream around 'in', you would still get partial reads even if HDFS reads all the data.

Raghu.

forbbs forbbs wrote:
The hadoop version is 0.19.0.
My file is larger than 64MB, and the block size is 64MB.

The output of the code below is '10'. May I read across the block
boundary?  Or I should use 'while (left..){}' style code?

 public static void main(String[] args) throws IOException
  {
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    FSDataInputStream fin = fs.open(new Path(args[0]));

    fin.seek(64*1024*1024 - 10);
    byte[] buffer = new byte[32*1024];
    int len = fin.read(buffer);
    //int len = fin.read(buffer, 0, 128);
    System.out.println(len);

    fin.close();
  }

Reply via email to