Aaron, although your notes are not a ready solution, but they are a great help.
Thank you, Mark On Tue, Oct 27, 2009 at 11:27 PM, Aaron Kimball <[email protected]> wrote: > There is no in-MapReduce mechanism for cross-task synchronization. You'll > need to use something like Zookeeper for this, or another external > database. > Note that this will greatly complicate your life. > > If I were you, I'd try to either redesign my pipeline elsewhere to > eliminate > this need, or maybe get really clever. For example, do your numbers need to > be sequential, or just unique? > > If the latter, then take the byte offset into the reducer's current output > file and combine that with the reducer id (e.g., > <current-byte-offset><zero-padded-reducer-id>) to guarantee that they're > all > building unique sequences. If the former... rethink your pipeline? :) > > - Aaron > > On Tue, Oct 27, 2009 at 8:55 PM, Mark Kerzner <[email protected]> > wrote: > > > Hi, > > > > I need to number all output records consecutively, like, 1,2,3... > > > > This is no problem with one reducer, making recordId an instance variable > > in > > the Reducer class, and setting conf.setNumReduceTasks(1) > > > > However, it is an architectural decision forced by processing need, where > > the reducer becomes a bottleneck. Can I have a global variable for all > > reducers, which would give each the next consecutive recordId? In the > > database scenario, this would be the unique autokey. How to do it in > > MapReduce? > > > > Thank you > > >
