Hello, Correct me if I'm wrong, but when a program opens n-files at the same time to read from, and start reading from each file at a time 1 line at a time. Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and not actually one line.
If this is correct, I set up my dfs.block.size = 3MB and each line takes about 650 bytes only, then I would assume the performance for reading 1-4000 lines would be the same, but it isn't ! Do you know a way to find #n of lines to be read at once? Thank you, Mark