Hello,
I am a bit confused between MultithreadedMapRunner and MultithreadedMapper classes. Basically I have huge "side data" (4GB) for the map part and I want it in memory. I don't want each mapper to load its own copy of that data. So I decided to limit one mapper per machine and and make it multithreaded so that all the cores are utilized. The side data is read only and can be shared by all threads. My question is: Which one of MultithreadedMapRunner and MultithreadedMapper classes should I be using? Or they have to be used together? (choose MultithreadedMapRunner in the config file and then extend MultithreadedMapper for map tasks). I notice that one is in mapred package and the other is in mapreduce package but neither is deprecated. I can use the latest version of Hadoop since I am just starting up. thanks in advance, Juber
