to my understanding pre/post Application/Superstep methods are called
ONCE on a "fake" vertex on each worker (the so called
representativeVertex). This means that these methods should not depend
on any specific-vertex data.
As I'm trying to sort out my Emitter, I thought I could create one
FSDataOutputStream per worker which each Vertex belonging to that
worker could share (which would be even thread-safe as each worker is
The questions are:
1) how to share the FSDataOutputFormat object created at
preApplication() (and closed at postApplication()) created by this
2) about the filename, I'd be happy to have access to the Worker Id so
to create an outputfile filename as with happens with reducers and
part files by FileOutputFormat (i.e. <userdefinedfilename>-workerid).
The "best" idea i have in my mind right now is to use the calling
vertex (the representativeVertex) hashCode as the id, and create an
external Singleton where i can request register and request the
outputfiles similarly to what happens with Aggregators now, and by
passing the *this* reference as an index to this map. Any better idea?