Thanks Chris, I have a different test running, then will implement that. Might give cascading a shot for what I am doing.
Cheers Tim On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote: > Hey Tim > > The .configure() method is what you are looking for i believe. > > It is called once per task, which in the default case, is once per jvm. > > Note Jobs are broken into parallel tasks, each task handles a portion of the > input data. So you may create your map 100 times, because there are 100 > tasks, it will only be created once per jvm. > > I hope this makes sense. > > chris > > On Nov 25, 2008, at 11:46 AM, tim robertson wrote: > >> Hi Doug, >> >> Thanks - it is not so much I want to run in a single JVM - I do want a >> bunch of machines doing the work, it is just I want them all to have >> this in-memory lookup index, that is configured once per job. Is >> there some hook somewhere that I can trigger a read from the >> distributed cache, or is a Mapper.configure() the best place for this? >> Can it be called multiple times per Job meaning I need to keep some >> static synchronised indicator flag? >> >> Thanks again, >> >> Tim >> >> >> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]> wrote: >>> >>> tim robertson wrote: >>>> >>>> Thanks Alex - this will allow me to share the shapefile, but I need to >>>> "one time only per job per jvm" read it, parse it and store the >>>> objects in the index. >>>> Is the Mapper.configure() the best place to do this? E.g. will it >>>> only be called once per job? >>> >>> In 0.19, with HADOOP-249, all tasks from a job can be run in a single >>> JVM. >>> So, yes, you could access a static cache from Mapper.configure(). >>> >>> Doug >>> >>> > > -- > Chris K Wensel > [EMAIL PROTECTED] > http://chris.wensel.net/ > http://www.cascading.org/ > >