On Fri, Sep 5, 2008 at 12:59 AM, Amit Kumar Singh <[EMAIL PROTECTED]>wrote:
> Can we use something like RAM FS to share static data across map tasks. As others have said, this won't work right. You probably should look at MultiThreadMapRunner<http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/mapred/lib/MultithreadedMapRunner.html>, which uses a thread pool to process the inputs. It is typically used for crawling or other map methods that take long times per a record. If you have substantial work inside the map, you can saturate CPUs that way. Of course the downside is that you have a single RecordReader feeding you inputs, so you are limited by the reading speed of a single HDFS client. -- Owen
