Harsh, Thanks for the response.
>From http://wiki.apache.org/hadoop/HadoopMapReduce >For example TextInputFormat will read the last line of the FileSplit past the split boundary and when reading other than the first FileSplit, TextInputFormat ignores the content up to the first newline. When the first record in the splits other than the first split is completeand not spanning boundaries, then based on the above logic this particular record is not processed by the mapper. Thanks, Praveen Cloudera Certified Developer for Apache Hadoop CDH4 (95%) http://www.thecloudavenue.com/ http://stackoverflow.com/users/614157/praveen-sripati If you aren’t taking advantage of big data, then you don’t have big data, you have just a pile of data. On Fri, Jan 25, 2013 at 12:52 AM, Harsh J <ha...@cloudera.com> wrote: > Hi Praveen, > > This is explained at http://wiki.apache.org/hadoop/HadoopMapReduce > [Map section]. > > On Thu, Jan 24, 2013 at 10:20 PM, Praveen Sripati > <praveensrip...@gmail.com> wrote: > > Hi, > > > > HDFS splits the file across record boundaries. So, how does the mapper > > processing the second block (b2) determine that the first record is > > incomplete and should process starting from the second record in the > block > > (b2)? > > > > Thanks, > > Praveen > > > > -- > Harsh J >