Re: Is it possible to input two different files under same mapper

Jason Venner Mon, 14 Jul 2008 06:48:54 -0700

This sounds like a good task for the Data Join code.

If you can set up so that all of your data is stored in MapFiles, withthe same type of key and the same partitioning setup and count, it willgo very well.


Mori Bellamy wrote:

Hey Amer,
It sounds to me like you're going to have to write your own inputformat (or atleast modify an existing one). Take a look here:http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html
I'm not sure how you'd go about doing this, but i hope this helps you.
(Also, have you considered preprocessing your input so that anyarbitrary mapper can know whether or not its looking at a line fromthe "large file"?)
On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:
HI,
My requirement is to compare the contents of one very large file (GBto TB size) with a bunch of smaller files (100s of MB to GB sizes).Is there a way I can give the mapper the 1st file independently ofthe remaining bunch?
Amer

--
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>

Attributor is hiring Hadoop Wranglers and coding wizards, contact ifinterested

Re: Is it possible to input two different files under same mapper

Reply via email to