Superstep contract

Avery Ching Sat, 01 Oct 2011 11:10:35 -0700

Can you show me an example of the inner Context class idea? Soundsinteresting...

Another question is whether to have the(pre|post)(Application|Superstep)() methods executed one as an aggregateand passed to the workers, or executed per worker. I think the formermight be a little expensive, depending on how big the "Context" is.Perhaps executed per worker makes the most sense. Any other thoughts?

Maybe aggregator methods would be useful as well, say to do this likewrite the aggregators for the entire application every now and then.That would probably get executed on the master. I think the currentuses of the (pre|post)(Application|Superstep)() methods are fine in theper-worker specific way of thinking.


Avery

On 10/1/11 7:06 AM, Jake Mannix wrote:

On Sat, Oct 1, 2011 at 2:29 AM, Hyunsik Choi <[email protected]<mailto:[email protected]>> wrote:


    Now, that way looks good. Probably, later we could improve that
    like Context
    of MapReduce.


ooooooh!  I really that suggestion, actually.  If every BasicVertex has an
inner Context class, we can allow user applications to define/extend their
Context and we can avoid even doing any of this setClass() and reflection
based stuff, if we do it right.  Typesafe context object FTW!

  -jake


    --
    Hyunsik Choi
    Database Lab, Korea University

    On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching <[email protected]
    <mailto:[email protected]>> wrote:
    > It isn't visible (purposefully) since it is internal state.
    >
    > That being said, I believe this type of functionality would be
    useful.
    >  Right now there is a lot of ugly static variables stored in Vertex
    > implementations because of it.  Perhaps we should add another
    method in
    > GiraphJob
    >
    > final public void setWorkerObjectClass(Class<? extends Configurable>
    > workerObjectClass);
    >
    > Then in BasicVertex
    >
    > public void preApplication(Configurable workerObject);
    > public void postApplication(Configurable workerObject);
    > public void preSuperstep(Configurable workerObject);
    > public void postSuperstep(Configurable workerObject);
    > public Configurable getWorkerObject();
    >
    > Anyone else think of a cleaner way to do it?
    >
    > Avery
    >
    > On 9/30/11 8:42 AM, Claudio Martella wrote:
    >>
    >> afaik getGraphState() is not visible to my object. Or?
    >>
    >> On Fri, Sep 30, 2011 at 5:23 PM, Jake
    Mannix<[email protected] <mailto:[email protected]>>
    >>  wrote:
    >>>
    >>> Remember that there's already a "singleton"-like object
    available to all
    >>> vertices: the GraphState object, which has a handle on the
    GraphMapper.
    >>> Maybe this is the right place to get your handle on the
    >>> FSDataOutputStream?
    >>>   -jake
    >>> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
    >>> <[email protected]
    <mailto:[email protected]>>  wrote:
    >>>>
    >>>> Hello,
    >>>>
    >>>> to my understanding pre/post Application/Superstep methods
    are called
    >>>> ONCE on a "fake" vertex on each worker (the so called
    >>>> representativeVertex). This means that these methods should
    not depend
    >>>> on any specific-vertex data.
    >>>>
    >>>> As I'm trying to sort out my Emitter, I thought I could
    create one
    >>>> FSDataOutputStream per worker which each Vertex belonging to that
    >>>> worker could share (which would be even thread-safe as each
    worker is
    >>>> not parallel).
    >>>>
    >>>> The questions are:
    >>>>
    >>>> 1) how to share the FSDataOutputFormat object created at
    >>>> preApplication() (and closed at postApplication()) created by
    this
    >>>> representativeVertex?
    >>>>
    >>>> 2) about the filename, I'd be happy to have access to the
    Worker Id so
    >>>> to create an outputfile filename as with happens with
    reducers and
    >>>> part files by FileOutputFormat
    (i.e.<userdefinedfilename>-workerid).
    >>>>
    >>>>
    >>>> The "best" idea i have in my mind right now is to use the calling
    >>>> vertex (the representativeVertex) hashCode as the id, and
    create an
    >>>> external Singleton where i can request register and request the
    >>>> outputfiles similarly to what happens with Aggregators now,
    and by
    >>>> passing the *this* reference as an index to this map. Any
    better idea?
    >>>> :)
    >>>>
    >>>>
    >>>> --
    >>>>     Claudio Martella
    >>>> [email protected] <mailto:[email protected]>
    >>>
    >>
    >>
    >
    >

Re: On pre/post Application/Superstep contract

Reply via email to