Thanks for the clarifications guys :)
Mark

On Mon, Oct 10, 2011 at 8:27 AM, Uma Maheswara Rao G 72686 <
[email protected]> wrote:

> I think below can give you more info about it.
>
> http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/
> Nice explanation by Owen here.
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: Yang Xiaoliang <[email protected]>
> Date: Wednesday, October 5, 2011 4:27 pm
> Subject: Re: hadoop input buffer size
> To: [email protected]
>
> > Hi,
> >
> > Hadoop neither read one line each time, nor fetching
> > dfs.block.size of lines
> > into a buffer,
> > Actually, for the TextInputFormat, it read io.file.buffer.size
> > bytes of text
> > into a buffer each time,
> > this can be seen from the hadoop source file LineReader.java
> >
> >
> >
> > 2011/10/5 Mark question <[email protected]>
> >
> > > Hello,
> > >
> > >  Correct me if I'm wrong, but when a program opens n-files at
> > the same time
> > > to read from, and start reading from each file at a time 1 line
> > at a time.
> > > Isn't hadoop actually fetching dfs.block.size of lines into a
> > buffer? and
> > > not actually one line.
> > >
> > >  If this is correct, I set up my dfs.block.size = 3MB and each
> > line takes
> > > about 650 bytes only, then I would assume the performance for
> > reading> 1-4000
> > > lines would be the same, but it isn't !  Do you know a way to
> > find #n of
> > > lines to be read at once?
> > >
> > > Thank you,
> > > Mark
> > >
> >
>

Reply via email to