Re: static data sharing among map funtions

brien colwell Thu, 10 Sep 2009 09:28:56 -0700

Here's a setup I've used:

- configuration data distributed to the mappers / reducers using theJobConf object- BDBs (stored in ZIP packages on the HDFS) used for read/write dataacross stages. The data flow organized so a single mapper modifies asingle database per stage, to avoid concurrency issues.

The concurrency of the shared read/write data will affect the storagetype. In general it's better performance to as much data local aspossible and then distribute it (e.g. store on the HDFS) at the end ofthe mapper job. If you need all mappers to share the same data at once,then a technology like memcache seems like a good approach.






Chandraprakash Bhagtani wrote:

If you really want to share read/write data you can use memcached server or
file based database like Tokyocabinet or BDB

Re: static data sharing among map funtions

Reply via email to