Hello,

  Correct me if I'm wrong, but when a program opens n-files at the same time
to read from, and start reading from each file at a time 1 line at a time.
Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and
not actually one line.

  If this is correct, I set up my dfs.block.size = 3MB and each line takes
about 650 bytes only, then I would assume the performance for reading 1-4000
lines would be the same, but it isn't !  Do you know a way to find #n of
lines to be read at once?

Thank you,
Mark

Reply via email to