On Fri, Sep 5, 2008 at 12:59 AM, Amit Kumar Singh
<[EMAIL PROTECTED]>wrote:

> Can we use something like RAM FS to share static data across map tasks.


As others have said, this won't work right. You probably should look at
MultiThreadMapRunner<http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/mapred/lib/MultithreadedMapRunner.html>,
which uses a thread pool to process the inputs. It is typically used for
crawling or other map methods that take long times per a record. If you have
substantial work inside the map, you can saturate CPUs that way. Of course
the downside is that you have a single RecordReader feeding you inputs, so
you are limited by the reading speed of a single HDFS client.

-- Owen

Reply via email to