Hey Amer,
It sounds to me like you're going to have to write your own input format (or atleast modify an existing one). Take a look here:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html

I'm not sure how you'd go about doing this, but i hope this helps you.

(Also, have you considered preprocessing your input so that any arbitrary mapper can know whether or not its looking at a line from the "large file"?)
On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:

HI,
My requirement is to compare the contents of one very large file (GB to TB size) with a bunch of smaller files (100s of MB to GB sizes). Is there a way I can give the mapper the 1st file independently of the remaining bunch?
Amer

Reply via email to