Thank you, Kevin, for a detailed explanation. I went ahead and shared both. Since I test on my machine, it worked :) but obviously it was a fluke, and I need to change my code for running on the cluster. Sincerely, Mark
On Wed, Sep 9, 2009 at 2:57 PM, Kevin Peterson <[email protected]> wrote: > On Tue, Sep 8, 2009 at 1:16 PM, Mark Kerzner <[email protected]> > wrote: > > > Hi, > > I have some code that's common between the main class, mapper, and > reducer. > > Can I put it only in the main class and use it from mapper and reducer? > > > > A similar question about static variables in the main - are the available > > from mapper and reducer? > > > > > Code yes, data no. > > Your mapper and reducer will have the full jar file that contains the job > (unless you are doing something very strange). You could include any code > you need to share, just as you would in any other java app. > > You can't pass data in static variables though. The main class is only > going > to run on the machine you submit the job from. When the mappers and > reducers > start up they will start in separate JVMs not even on the same physical > node. If you need to distribute a large amount of data, you can use > distributed cache. If you just need to pass some settings, you could > accomplish it by setting child opts (options passed to the JVMs for the > mapper and reducers) in the config. If you need some sort of coordination > more complicated than this, you should look into zookeeper. >
