Hello,

I am a bit confused between MultithreadedMapRunner and
MultithreadedMapper classes. Basically I have huge "side data" (4GB)
for the map part and I want it in memory. I don't want each mapper to
load its own copy of that data. So I decided to limit one mapper per
machine and and make it multithreaded so that all the cores are
utilized. The side data is read only and can be shared by all threads.

My question is: Which one of MultithreadedMapRunner and
MultithreadedMapper classes should I be using? Or they have to be used
together? (choose MultithreadedMapRunner in the config file and then
extend MultithreadedMapper for map tasks). I notice that one is in
mapred package and the other is in mapreduce package but neither is
deprecated. I can use the latest version of Hadoop since I am just
starting up.


thanks in advance,


Juber

Reply via email to