why not just pass the large file name as an argument to your mappers?  each
mapper could then access that file as it saw fit, without having to go
through contortions.

Miles

2008/7/11 Muhammad Ali Amer <[EMAIL PROTECTED]>:

> Thanks Mori,
>  So far I cannot touch the large file, its just a very very long string ,
> and I have to "approximately" match smaller strings against it. I will give
> it a try with the FileSplit and see if I am not merging the two together.
>
> On Jul 11, 2008, at 1:41 PM, Mori Bellamy wrote:
>
>  Hey Amer,
>> It sounds to me like you're going to have to write your own input format
>> (or atleast modify an existing one). Take a look here:
>>
>> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/FileSplit.html
>>
>> I'm not sure how you'd go about doing this, but i hope this helps you.
>>
>> (Also, have you considered preprocessing your input so that any arbitrary
>> mapper can know whether or not its looking at a line from the "large file"?)
>> On Jul 11, 2008, at 12:31 PM, Muhammad Ali Amer wrote:
>>
>>  HI,
>>> My requirement is to compare the contents of one very large file (GB to
>>> TB size) with a bunch of smaller files (100s of MB to GB  sizes). Is there a
>>> way I can give the mapper the 1st file independently of the remaining bunch?
>>> Amer
>>>
>>
>>
>>
> Muhammad Ali Amer
> Center For Grid Technologies
> Information Sciences Institute
> USC Viterbi School Of Engg
> Tel : (310) 448-8349
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.

Reply via email to