Johan Oskarsson wrote:
Any advice on how to solve this problem?
I think your current solutions sound reasonable.
Would it be possible to somehow share a hashmap between tasks?
Not without running multiple tasks in the same JVM. We could implement
a mode where child tasks are run directly
There's also code floating around for a Multithreaded MapRunner. This (with
appropriate synchronization) would allow a shared HashMap without having to
pay the per-simultaneous-map overhead.
Another thing that might or might not make sense would be to use memcached
for your hashtable. This may
Hi.
Currently some of my map reduce jobs need quick access to additional
data to check some input values in the map phase.
This data is currently held in memory in a hashmap. It's very quick but
as each job starts several jvms the data will be held in memory multiple
times. It will also