Thanks Chris,

I have a different test running, then will implement that.  Might give
cascading a shot for what I am doing.

Cheers

Tim


On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
> Hey Tim
>
> The .configure() method is what you are looking for i believe.
>
> It is called once per task, which in the default case, is once per jvm.
>
> Note Jobs are broken into parallel tasks, each task handles a portion of the
> input data. So you may create your map 100 times, because there are 100
> tasks, it will only be created once per jvm.
>
> I hope this makes sense.
>
> chris
>
> On Nov 25, 2008, at 11:46 AM, tim robertson wrote:
>
>> Hi Doug,
>>
>> Thanks - it is not so much I want to run in a single JVM - I do want a
>> bunch of machines doing the work, it is just I want them all to have
>> this in-memory lookup index, that is configured once per job.  Is
>> there some hook somewhere that I can trigger a read from the
>> distributed cache, or is a Mapper.configure() the best place for this?
>> Can it be called multiple times per Job meaning I need to keep some
>> static synchronised indicator flag?
>>
>> Thanks again,
>>
>> Tim
>>
>>
>> On Tue, Nov 25, 2008 at 8:41 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
>>>
>>> tim robertson wrote:
>>>>
>>>> Thanks Alex - this will allow me to share the shapefile, but I need to
>>>> "one time only per job per jvm" read it, parse it and store the
>>>> objects in the index.
>>>> Is the Mapper.configure() the best place to do this?  E.g. will it
>>>> only be called once per job?
>>>
>>> In 0.19, with HADOOP-249, all tasks from a job can be run in a single
>>> JVM.
>>> So, yes, you could access a static cache from Mapper.configure().
>>>
>>> Doug
>>>
>>>
>
> --
> Chris K Wensel
> [EMAIL PROTECTED]
> http://chris.wensel.net/
> http://www.cascading.org/
>
>

Reply via email to