Thanks Mori,
So far I cannot touch the large file, its just a very very long string , and I have to "approximately" match smaller strings against it. I will give it a try with the FileSplit and see if I am not merging the two together.

On Jul 11, 2008, at 1:41 PM, Mori Bellamy wrote:

Hey Amer,
It sounds to me like you're going to have to write your own input format (or atleast modify an existing one). Take a look here:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html

I'm not sure how you'd go about doing this, but i hope this helps you.

(Also, have you considered preprocessing your input so that any arbitrary mapper can know whether or not its looking at a line from the "large file"?)
On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:

HI,
My requirement is to compare the contents of one very large file (GB to TB size) with a bunch of smaller files (100s of MB to GB sizes). Is there a way I can give the mapper the 1st file independently of the remaining bunch?
Amer



Muhammad Ali Amer
Center For Grid Technologies
Information Sciences Institute
USC Viterbi School Of Engg
Tel : (310) 448-8349

Reply via email to