Hi Juber, MultithreadedMapper uses new api that got introduced in branch 0.20, whereas MultithreadedMapRunner uses old interface. MultithreadedMapRunner is deprecated in branch 0.21 through https://issues.apache.org/jira/browse/MAPREDUCE-465. If you are using branch 0.20, you can use any one of them. But do not use them together. I would prefer to use MultthreadedMapper, because the other will be deprecated in subsequent versions.
Thanks Amareshwari On 5/17/10 7:25 AM, "juber patel" <[email protected]> wrote: Hello, I am a bit confused between MultithreadedMapRunner and MultithreadedMapper classes. Basically I have huge "side data" (4GB) for the map part and I want it in memory. I don't want each mapper to load its own copy of that data. So I decided to limit one mapper per machine and and make it multithreaded so that all the cores are utilized. The side data is read only and can be shared by all threads. My question is: Which one of MultithreadedMapRunner and MultithreadedMapper classes should I be using? Or they have to be used together? (choose MultithreadedMapRunner in the config file and then extend MultithreadedMapper for map tasks). I notice that one is in mapred package and the other is in mapreduce package but neither is deprecated. I can use the latest version of Hadoop since I am just starting up. thanks in advance, Juber
