Re: How to give consecutive numbers to output records?

Mark Kerzner Tue, 27 Oct 2009 21:35:11 -0700

Aaron, although your notes are not a ready solution, but they are a great
help.


Thank you,
Mark

On Tue, Oct 27, 2009 at 11:27 PM, Aaron Kimball <[email protected]> wrote:

> There is no in-MapReduce mechanism for cross-task synchronization. You'll
> need to use something like Zookeeper for this, or another external
> database.
> Note that this will greatly complicate your life.
>
> If I were you, I'd try to either redesign my pipeline elsewhere to
> eliminate
> this need, or maybe get really clever. For example, do your numbers need to
> be sequential, or just unique?
>
> If the latter, then take the byte offset into the reducer's current output
> file and combine that with the reducer id (e.g.,
> <current-byte-offset><zero-padded-reducer-id>) to guarantee that they're
> all
> building unique sequences. If the former... rethink your pipeline? :)
>
> - Aaron
>
> On Tue, Oct 27, 2009 at 8:55 PM, Mark Kerzner <[email protected]>
> wrote:
>
> > Hi,
> >
> > I need to number all output records consecutively, like, 1,2,3...
> >
> > This is no problem with one reducer, making recordId an instance variable
> > in
> > the Reducer class, and setting conf.setNumReduceTasks(1)
> >
> > However, it is an architectural decision forced by processing need, where
> > the reducer becomes a bottleneck. Can I have a global variable for all
> > reducers, which would give each the next consecutive recordId? In the
> > database scenario, this would be the unique autokey. How to do it in
> > MapReduce?
> >
> > Thank you
> >
>

Re: How to give consecutive numbers to output records?

Reply via email to