Re: Using own InputSplit

Mohit Anchlia Fri, 27 May 2011 10:09:16 -0700

thanks! Just thought it's better to post to multiple groups together
since I didn't know where it belongs :)


On Fri, May 27, 2011 at 10:04 AM, Harsh J <ha...@cloudera.com> wrote:
> Mohit,
>
> Please do not cross-post a question to multiple lists unless you're
> announcing something.
>
> What you describe, does not happen; and the way the splitting is done
> for Text files is explained in good detail here:
> http://wiki.apache.org/hadoop/HadoopMapReduce
>
> Hope this solves your doubt :)
>
> On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia <mohitanch...@gmail.com> 
> wrote:
>> I am new to hadoop and from what I understand by default hadoop splits
>> the input into blocks. Now this might result in splitting a line of
>> record into 2 pieces and getting spread accross 2 maps. For eg: Line
>> "abcd" might get split into "ab" and "cd". How can one prevent this in
>> hadoop and pig? I am looking for some examples where I can see how I
>> can specify my own split so that it logically splits based on the
>> record delimiter and not the block size. For some reason I am not able
>> to get right examples online.
>>
>
>
>
> --
> Harsh J
>

Re: Using own InputSplit

Reply via email to