Looking through MultithreadedMapRunner, map() seems to be the only method
called by executorService:
MultithreadedMapRunner.this.mapper.map(key, value, output,
reporter);
On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <[email protected]> wrote:
> Hi,
>
> I've decided to refactor some of my Hadoop jobs and implement them
> using MultithreadedMapper.class but I got puzzled because of some
> unexpected error messages at run time.
> Here are some relevant settings regarding my Hadoop cluster:
>
> mapred.tasktracker.map.tasks.maximum = 1
> mapred.tasktracker.reduce.tasks.maximum = 1
> mapred.job.reuse.jvm.num.tasks = -1
> mapred.map.multithreadedrunner.threads = 4
>
> I'd like to know how threads are used to run the map task in a single
> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
> class as such:
>
> class Mapper ... {
>
> MyObject A;
> static MyObject B;
>
> setup() {
> Configuration conf = context.getConfiguration();
> A.initialize(c);
> B.initialize(c);
> }
>
> map() {...}
>
> cleanup() {...}
>
> Does each thread run all three of setup(), map(), cleanup() methods ?
>
> -OR-
>
> Are setup() and cleanup() run once per task (and thus per JVM
> according to my settings) and so map is the only multithreaded
> function?
> Also, are the objects A and B shared among different threads or does
> each trade have its own copy of them? My initial guess was that each
> thread would have a separate copy of A, and B would be shared among
> the 4 threads running on the same box since it is defined as static,
> but it appears to me that this assumption is not correct and A seems
> to be shared.
>
> Thanks,
> Jim
>