the problem here is that you don't want each mapper/reducer to have a
copy of the data.  you want that data --which can be very large--
stored in a distributed manner over your cluster and allow random
access to it during computation.

(this is what HBase etc do)

Miles

2008/9/19 Stuart Sierra <[EMAIL PROTECTED]>:
> On Thu, Sep 18, 2008 at 1:05 AM, Chris Dyer <[EMAIL PROTECTED]> wrote:
>> Basically, I'd like to be able to
>> load the entire contents of a file key-value map file in DFS into
>> memory across many machines in my cluster so that I can access any of
>> it with ultra-low latencies.
>
> I think the simplest way, which I've used, is to put your key-value
> file into DistributedCache, then load it into a HashMap or ArrayList
> in the configure method of each Map/Reduce task.
>
> -Stuart
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Reply via email to