Re: How to maintain record boundaries

Harsh J Fri, 11 May 2012 05:20:25 -0700

Shreya,

This has been asked several times before, and the way it is handled by
TextInputFormats (for one example) is explained at
http://wiki.apache.org/hadoop/HadoopMapReduce in the Map section. If
you are writing a custom reader, feel free to follow the same steps -
you basically need to seek over to next blocks for an end-record
marker and not limit yourself to just one-block reads.


All input formats provided in MR handle this already for you, and you
needn't worry about this unless you're implementing a whole new reader
from scratch.

On Fri, May 11, 2012 at 5:45 PM,  <shreya....@cognizant.com> wrote:
> Hi
>
> When we store data into HDFS, it gets broken into small pieces and 
> distributed across the cluster based on Block size for the file.
> While processing the data using MR program I want a particular record as a 
> whole without it being split across nodes, but the data has already been 
> split and stored in HDFS when I loaded the data.
> How would I make sure that my record doesn't get split, how would my Input 
> format make a difference now ?
>
> Regards
> Shreya
>
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful.



-- 
Harsh J

Re: How to maintain record boundaries

Reply via email to