That's right.
You need to balance cost of synchronization and parallelism provided by
MultithreadedMapRunner.

Cheers

On Wed, Apr 28, 2010 at 8:52 AM, Jim Twensky <[email protected]> wrote:

> Thanks Ted. Is it correct to assume that all class members defined
> inside my Mapper are visible to all of the threads, so I should pay
> careful attention and take synchronization into account when accessing
> those objects?
>
> Jim
>
> On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu <[email protected]> wrote:
> > Looking through MultithreadedMapRunner, map() seems to be the only method
> > called by executorService:
> >        MultithreadedMapRunner.this.mapper.map(key, value, output,
> > reporter);
> >
> >
> > On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> I've decided to refactor some of my Hadoop jobs and implement them
> >> using MultithreadedMapper.class but I got puzzled because of some
> >> unexpected error messages at run time.
> >> Here are some relevant settings regarding my Hadoop cluster:
> >>
> >> mapred.tasktracker.map.tasks.maximum = 1
> >> mapred.tasktracker.reduce.tasks.maximum = 1
> >> mapred.job.reuse.jvm.num.tasks = -1
> >> mapred.map.multithreadedrunner.threads = 4
> >>
> >> I'd like to know how threads are used to run the map task in a single
> >> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper
> >> class as such:
> >>
> >> class Mapper ... {
> >>
> >> MyObject A;
> >> static MyObject B;
> >>
> >> setup() {
> >>   Configuration conf = context.getConfiguration();
> >>   A.initialize(c);
> >>   B.initialize(c);
> >> }
> >>
> >> map() {...}
> >>
> >> cleanup() {...}
> >>
> >> Does each thread run all three of setup(), map(), cleanup() methods ?
> >>
> >> -OR-
> >>
> >> Are setup() and cleanup() run once per task (and thus per JVM
> >> according to my settings) and so map is the only multithreaded
> >> function?
> >> Also, are the objects A and B shared among different threads or does
> >> each trade have its own copy of them? My initial guess was that each
> >> thread would have a separate copy of A, and B would be shared among
> >> the 4 threads running on the same box since it is defined as static,
> >> but it appears to me that this assumption is not correct and A seems
> >> to be shared.
> >>
> >> Thanks,
> >> Jim
> >>
> >
>

Reply via email to