thanks! Just thought it's better to post to multiple groups together since I didn't know where it belongs :)
On Fri, May 27, 2011 at 10:04 AM, Harsh J <ha...@cloudera.com> wrote: > Mohit, > > Please do not cross-post a question to multiple lists unless you're > announcing something. > > What you describe, does not happen; and the way the splitting is done > for Text files is explained in good detail here: > http://wiki.apache.org/hadoop/HadoopMapReduce > > Hope this solves your doubt :) > > On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia <mohitanch...@gmail.com> > wrote: >> I am new to hadoop and from what I understand by default hadoop splits >> the input into blocks. Now this might result in splitting a line of >> record into 2 pieces and getting spread accross 2 maps. For eg: Line >> "abcd" might get split into "ab" and "cd". How can one prevent this in >> hadoop and pig? I am looking for some examples where I can see how I >> can specify my own split so that it logically splits based on the >> record delimiter and not the block size. For some reason I am not able >> to get right examples online. >> > > > > -- > Harsh J >