Hi Juber,

MultithreadedMapper uses new api that got introduced in branch 0.20, whereas 
MultithreadedMapRunner uses old interface.
MultithreadedMapRunner is deprecated in branch 0.21 through 
https://issues.apache.org/jira/browse/MAPREDUCE-465.
If you are using branch 0.20, you can use any one of them. But do not use them 
together.
I would prefer to use MultthreadedMapper, because the other will be deprecated 
in subsequent versions.

Thanks
Amareshwari

On 5/17/10 7:25 AM, "juber patel" <[email protected]> wrote:

Hello,


I am a bit confused between MultithreadedMapRunner and
MultithreadedMapper classes. Basically I have huge "side data" (4GB)
for the map part and I want it in memory. I don't want each mapper to
load its own copy of that data. So I decided to limit one mapper per
machine and and make it multithreaded so that all the cores are
utilized. The side data is read only and can be shared by all threads.

My question is: Which one of MultithreadedMapRunner and
MultithreadedMapper classes should I be using? Or they have to be used
together? (choose MultithreadedMapRunner in the config file and then
extend MultithreadedMapper for map tasks). I notice that one is in
mapred package and the other is in mapreduce package but neither is
deprecated. I can use the latest version of Hadoop since I am just
starting up.


thanks in advance,


Juber

Reply via email to