Hello, to my understanding pre/post Application/Superstep methods are called ONCE on a "fake" vertex on each worker (the so called representativeVertex). This means that these methods should not depend on any specific-vertex data.
As I'm trying to sort out my Emitter, I thought I could create one FSDataOutputStream per worker which each Vertex belonging to that worker could share (which would be even thread-safe as each worker is not parallel). The questions are: 1) how to share the FSDataOutputFormat object created at preApplication() (and closed at postApplication()) created by this representativeVertex? 2) about the filename, I'd be happy to have access to the Worker Id so to create an outputfile filename as with happens with reducers and part files by FileOutputFormat (i.e. <userdefinedfilename>-workerid). The "best" idea i have in my mind right now is to use the calling vertex (the representativeVertex) hashCode as the id, and create an external Singleton where i can request register and request the outputfiles similarly to what happens with Aggregators now, and by passing the *this* reference as an index to this map. Any better idea? :) -- Claudio Martella [email protected]
