That's right. You need to balance cost of synchronization and parallelism provided by MultithreadedMapRunner.
Cheers On Wed, Apr 28, 2010 at 8:52 AM, Jim Twensky <[email protected]> wrote: > Thanks Ted. Is it correct to assume that all class members defined > inside my Mapper are visible to all of the threads, so I should pay > careful attention and take synchronization into account when accessing > those objects? > > Jim > > On Tue, Apr 27, 2010 at 11:50 PM, Ted Yu <[email protected]> wrote: > > Looking through MultithreadedMapRunner, map() seems to be the only method > > called by executorService: > > MultithreadedMapRunner.this.mapper.map(key, value, output, > > reporter); > > > > > > On Tue, Apr 27, 2010 at 3:46 PM, Jim Twensky <[email protected]> > wrote: > > > >> Hi, > >> > >> I've decided to refactor some of my Hadoop jobs and implement them > >> using MultithreadedMapper.class but I got puzzled because of some > >> unexpected error messages at run time. > >> Here are some relevant settings regarding my Hadoop cluster: > >> > >> mapred.tasktracker.map.tasks.maximum = 1 > >> mapred.tasktracker.reduce.tasks.maximum = 1 > >> mapred.job.reuse.jvm.num.tasks = -1 > >> mapred.map.multithreadedrunner.threads = 4 > >> > >> I'd like to know how threads are used to run the map task in a single > >> JVM (Correct me if this is wrong). Suppose I've got a sample Mapper > >> class as such: > >> > >> class Mapper ... { > >> > >> MyObject A; > >> static MyObject B; > >> > >> setup() { > >> Configuration conf = context.getConfiguration(); > >> A.initialize(c); > >> B.initialize(c); > >> } > >> > >> map() {...} > >> > >> cleanup() {...} > >> > >> Does each thread run all three of setup(), map(), cleanup() methods ? > >> > >> -OR- > >> > >> Are setup() and cleanup() run once per task (and thus per JVM > >> according to my settings) and so map is the only multithreaded > >> function? > >> Also, are the objects A and B shared among different threads or does > >> each trade have its own copy of them? My initial guess was that each > >> thread would have a separate copy of A, and B would be shared among > >> the 4 threads running on the same box since it is defined as static, > >> but it appears to me that this assumption is not correct and A seems > >> to be shared. > >> > >> Thanks, > >> Jim > >> > > >
