The class responsible for reading records as lines off a file, seek in to the next block in sequence until the newline. This behavior, and how it affects the Map tasks, is better documented here (see the TextInputFormat example doc): http://wiki.apache.org/hadoop/HadoopMapReduce
On Sat, Mar 5, 2011 at 1:54 AM, Kelly Burkhart <[email protected]> wrote: > On Fri, Mar 4, 2011 at 1:42 PM, Harsh J <[email protected]> wrote: >> HDFS does not operate with records in mind. > > So does that mean that HDFS will break a file at exactly <blocksize> > bytes? Map/Reduce *does* operate with records in mind, so what > happens to the split record? Does HDFS put the fragments back > together and deliver the reconstructed record to one map? Or are both > fragments and consequently the whole record discarded? > > Thanks, > > -Kelly > -- Harsh J www.harshj.com
