Re: FSDataInputStream.read(byte[]) only reads to a block boundary?
This seems to be the case. I don't think there is any specific reason not to read across the block boundary... Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of the readFully() variants. For e.g. if you later change your code by wrapping a BufferedInputStream around 'in', you would still get partial reads even if HDFS reads all the data. Raghu. forbbs forbbs wrote: The hadoop version is 0.19.0. My file is larger than 64MB, and the block size is 64MB. The output of the code below is '10'. May I read across the block boundary? Or I should use 'while (left..){}' style code? public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FSDataInputStream fin = fs.open(new Path(args[0])); fin.seek(64*1024*1024 - 10); byte[] buffer = new byte[32*1024]; int len = fin.read(buffer); //int len = fin.read(buffer, 0, 128); System.out.println(len); fin.close(); }
Re: FSDataInputStream.read(byte[]) only reads to a block boundary?
This kind of partial read is often used by the OS to return to your application as soon as possible if trying to read more data would block, in case you can begin computing on the partial data. In some applications, it's not useful, but when you can begin computing on partial data, it allows the OS to overlap IO with your computation, improving throughput. I think FSDataInputStream returns at the block boundary for the same reason. On Sun, Jun 28, 2009 at 11:16 AM, Raghu Angadi rang...@yahoo-inc.comwrote: This seems to be the case. I don't think there is any specific reason not to read across the block boundary... Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of the readFully() variants. For e.g. if you later change your code by wrapping a BufferedInputStream around 'in', you would still get partial reads even if HDFS reads all the data. Raghu. forbbs forbbs wrote: The hadoop version is 0.19.0. My file is larger than 64MB, and the block size is 64MB. The output of the code below is '10'. May I read across the block boundary? Or I should use 'while (left..){}' style code? public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FSDataInputStream fin = fs.open(new Path(args[0])); fin.seek(64*1024*1024 - 10); byte[] buffer = new byte[32*1024]; int len = fin.read(buffer); //int len = fin.read(buffer, 0, 128); System.out.println(len); fin.close(); }
Re: FSDataInputStream.read(byte[]) only reads to a block boundary?
On Sun, Jun 28, 2009 at 3:01 PM, Matei Zaharia ma...@cloudera.com wrote: This kind of partial read is often used by the OS to return to your application as soon as possible if trying to read more data would block, in case you can begin computing on the partial data. In some applications, it's not useful, but when you can begin computing on partial data, it allows the OS to overlap IO with your computation, improving throughput. I think FSDataInputStream returns at the block boundary for the same reason. It is very unusual, nay, unexpected to the point of bizarre, for the OS to do so on a regular file. Typically only seen on network fds. On Sun, Jun 28, 2009 at 11:16 AM, Raghu Angadi rang...@yahoo-inc.com wrote: This seems to be the case. I don't think there is any specific reason not to read across the block boundary... Even if HDFS does read across the blocks, it is still not a good idea to ignore the JavaDoc for read(). If you want all the bytes read, then you should have a while loop or one of the readFully() variants. For e.g. if you later change your code by wrapping a BufferedInputStream around 'in', you would still get partial reads even if HDFS reads all the data. Raghu. forbbs forbbs wrote: The hadoop version is 0.19.0. My file is larger than 64MB, and the block size is 64MB. The output of the code below is '10'. May I read across the block boundary? Or I should use 'while (left..){}' style code? public static void main(String[] args) throws IOException { Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); FSDataInputStream fin = fs.open(new Path(args[0])); fin.seek(64*1024*1024 - 10); byte[] buffer = new byte[32*1024]; int len = fin.read(buffer); //int len = fin.read(buffer, 0, 128); System.out.println(len); fin.close(); }