Thank you Jason! How about if I fix the size of each record to the size of
the largest record by adding dummy characters to the rest of the records and
then set the setMaxInputSplitSize() and setMinInputSplitSize() of
FileInputFormat class to this value? The mapper will extract the input after
ignoring the dummy characters. Do you think this could work? Thanks.
Regards,
Upendra
----- Original Message -----
From: "Jason Venner" <jason.had...@gmail.com>
To: <common-dev@hadoop.apache.org>
Sent: Friday, November 27, 2009 12:06 AM
Subject: Re: how to set one map task for each input key-value pair
The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.
You can also tell the framework not to split your individual input files
by
setting the minimum input split size (mapred.min.split.size) to
Long.MAX_VALUE
On Thu, Nov 26, 2009 at 4:53 PM, Upendra Dadi <ud...@gmu.edu> wrote:
Hi,
I am trying to use MapReduce with some scientific data. I have key-value
pairs such that the size of the value can range from few megabytes to
several hundreds of megabytes. What happens when the size of the value
exceeds block size? How do I set it up so that each key-value pair is
associated with a seperate map? Please some one help. Thanks.
Regards,
Upendra
--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals