Given the goal of a shared data accessable across the Map instances, can someone please explain some of the differences between using: - setNumTasksToExecutePerJvm() and then having statically declared data initialised in Mapper.configure(); and - a MultithreadedMapRunner?
Regards, Shane On Wed, Nov 26, 2008 at 6:41 AM, Doug Cutting <[EMAIL PROTECTED]> wrote: > tim robertson wrote: >> >> Thanks Alex - this will allow me to share the shapefile, but I need to >> "one time only per job per jvm" read it, parse it and store the >> objects in the index. >> Is the Mapper.configure() the best place to do this? E.g. will it >> only be called once per job? > > In 0.19, with HADOOP-249, all tasks from a job can be run in a single JVM. > So, yes, you could access a static cache from Mapper.configure(). > > Doug > >