Thanks for the clarifications guys :) Mark On Mon, Oct 10, 2011 at 8:27 AM, Uma Maheswara Rao G 72686 < [email protected]> wrote:
> I think below can give you more info about it. > > http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/ > Nice explanation by Owen here. > > Regards, > Uma > > ----- Original Message ----- > From: Yang Xiaoliang <[email protected]> > Date: Wednesday, October 5, 2011 4:27 pm > Subject: Re: hadoop input buffer size > To: [email protected] > > > Hi, > > > > Hadoop neither read one line each time, nor fetching > > dfs.block.size of lines > > into a buffer, > > Actually, for the TextInputFormat, it read io.file.buffer.size > > bytes of text > > into a buffer each time, > > this can be seen from the hadoop source file LineReader.java > > > > > > > > 2011/10/5 Mark question <[email protected]> > > > > > Hello, > > > > > > Correct me if I'm wrong, but when a program opens n-files at > > the same time > > > to read from, and start reading from each file at a time 1 line > > at a time. > > > Isn't hadoop actually fetching dfs.block.size of lines into a > > buffer? and > > > not actually one line. > > > > > > If this is correct, I set up my dfs.block.size = 3MB and each > > line takes > > > about 650 bytes only, then I would assume the performance for > > reading> 1-4000 > > > lines would be the same, but it isn't ! Do you know a way to > > find #n of > > > lines to be read at once? > > > > > > Thank you, > > > Mark > > > > > >
