Shreya, This has been asked several times before, and the way it is handled by TextInputFormats (for one example) is explained at http://wiki.apache.org/hadoop/HadoopMapReduce in the Map section. If you are writing a custom reader, feel free to follow the same steps - you basically need to seek over to next blocks for an end-record marker and not limit yourself to just one-block reads.
All input formats provided in MR handle this already for you, and you needn't worry about this unless you're implementing a whole new reader from scratch. On Fri, May 11, 2012 at 5:45 PM, <shreya....@cognizant.com> wrote: > Hi > > When we store data into HDFS, it gets broken into small pieces and > distributed across the cluster based on Block size for the file. > While processing the data using MR program I want a particular record as a > whole without it being split across nodes, but the data has already been > split and stored in HDFS when I loaded the data. > How would I make sure that my record doesn't get split, how would my Input > format make a difference now ? > > Regards > Shreya > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. -- Harsh J