On Sun, Jun 28, 2009 at 3:01 PM, Matei Zaharia <ma...@cloudera.com> wrote:

> This kind of partial read is often used by the OS to return to your
> application as soon as possible if trying to read more data would block, in
> case you can begin computing on the partial data. In some applications,
> it's
> not useful, but when you can begin computing on partial data, it allows the
> OS to overlap IO with your computation, improving throughput. I think
> FSDataInputStream returns at the block boundary for the same reason.
>

It is very unusual, nay, unexpected to the point of bizarre, for the OS to
do so on a regular file. Typically only seen on network fds.



>
> On Sun, Jun 28, 2009 at 11:16 AM, Raghu Angadi <rang...@yahoo-inc.com
> >wrote:
>
> >
> > This seems to be the case. I don't think there is any specific reason not
> > to read across the block boundary...
> >
> > Even if HDFS does read across the blocks, it is still not a good idea to
> > ignore the JavaDoc for read(). If you want all the bytes read, then you
> > should have a while loop or one of the readFully() variants. For e.g. if
> you
> > later change your code by wrapping a BufferedInputStream around 'in', you
> > would still get partial reads even if HDFS reads all the data.
> >
> > Raghu.
> >
> >
> > forbbs forbbs wrote:
> >
> >> The hadoop version is 0.19.0.
> >> My file is larger than 64MB, and the block size is 64MB.
> >>
> >> The output of the code below is '10'. May I read across the block
> >> boundary?  Or I should use 'while (left..){}' style code?
> >>
> >>  public static void main(String[] args) throws IOException
> >>  {
> >>    Configuration conf = new Configuration();
> >>    FileSystem fs = FileSystem.get(conf);
> >>    FSDataInputStream fin = fs.open(new Path(args[0]));
> >>
> >>    fin.seek(64*1024*1024 - 10);
> >>    byte[] buffer = new byte[32*1024];
> >>    int len = fin.read(buffer);
> >>    //int len = fin.read(buffer, 0, 128);
> >>    System.out.println(len);
> >>
> >>    fin.close();
> >>  }
> >>
> >
> >
>

Reply via email to