Re: Primitives vs Objects (the Movie!)

Avery Ching Wed, 07 Sep 2011 14:26:57 -0700

Haha, this really is turning into a movie =). I'll start warming up thepopcorn.


On 9/7/11 12:51 PM, Jake Mannix wrote:

On Wed, Sep 7, 2011 at 6:33 AM, Avery Ching <[email protected]<mailto:[email protected]>> wrote:
      I'm not sure that this is precisely the right API, but exposing
    the inner SortedMap definitely has a "leaky abstraction" smell to
    it to me, especially where there are no examples or algorithm
    implementations in the codebase which currently *need* this to be
    exposed in such a way.
    We could expose just the Map and provide helper methods as you
    suggest.  It's a thought.  You're right that it's not used in this
    way in any examples.
Maybe a few more examples would help? Cases where you want to do aBSP computation where the total sort (both the vertexes, and the edgesfor each vertex) is required, as is the random access nature of the Map?

I think the range based examples are the ones that immediately come tomind. BSP for graph processing is still pretty new, and I have no ideawhat kind of interesting algorithms will be tried out on this platform.We are still exploring many possible algorithms to run.

    I can see how the Edge object is nice, it does seem like a better
    api then basically just a Map<I,E>.  Why not a sorted
    List<Edge<I,E>>, then, where the Comparator ignores the edge
    value, but only sorts by the dest vertexId?  Because the contract
    of List doesn't *require* the sort order to remain fixed, I
    guess?  This yet again underscores for me the "spidey-sense"
    wrongness of exposing the SortedMap exactly as it is.  Maybe just
    expose the methods on a map that are expected to be needed, and
    methods can be added as time goes on?

    I think Map over List makes sense for the edges to find them
    quickly (my personal opinion).

Yeah, if there are cases where you're doing searches for paths, thisis probably true, some form of random access is probably best.


What about something as simple as:

  /*
   * @return an immutable, sorted view of the target vertices
   */
  public ImmutableList<I> getTargetVertices();

  /*

* @return E edge for the supplied vertex (or null if there isn't anedge to this vertex from the current one)

   */
  public <E> Edge<I,E> getEdge(I vertex);

in conjunction with the already existing

  public boolean addEdge(Edge<I,E> edge);

you have all the actions of a sorted, random-access map, but withoutexposing the way it's stored internally. Why is there addEdge() butnot removeEdge()?

I think the idea was that after returning the map, users could directlymanipulate the map of edges or use the interfaces, there should probablybe a removeEdge() too. I'm starting to feel that we should remove Edgefrom the user perspective, just keep it internally only for the add edgerequests. It just makes things a little more complex to the user (toomany ways to do the same thing). Perhaps the interfaces you specifiedcould hit most of the use cases (getTargetVertices(), getEdgeValue(),addEdge(), and removeEdge()). If there turns out to be a big need, wecan always change it back to a SortedMap or something else more appropriate.

      Problem 1)  static mutable state (ew!) - lots of references to
    Vertex.setSuperStep(), Vertex.setNumVertices(),
    Vertex.setNumEdges(), Vertex.setGraphMapper(), etc... which means
    that implemeting BasicVertex / MutableVertex isn't good enough,
    you have to actually be a subclass of Vertex! :\

    Oh, you found our dirty little secret. =)  Yes, we do have static
    mutable state.  The concern is that if every vertex actually keeps
    track of all those things as members, I think a decent amount of
    memory would be wasted.  I'm sure there is a cleaner way to
    implement this though (i.e. Flyweight pattern perhaps), just never
    got around to fixing it.

Yeah, keeping one extra 4 or 8 bytes of a reference to say, theGraphMapper inside of each Vertex, and having the GraphMapper itselfcentralize all of these generic Vertex attributes might be the way togo. I'll see if I can whip that up as part of my primitive patch.

Cool.

      Problem 2) Vertex.class is directly referenced as the base
    class in places like BspUtils, etc.
    We never expected to have another base Vertex class. =)
True, but in many of these places, it's just the interface type whichis required, right? BaseVertex is what they all implement, is there areason why this can't be what's referred to in these cases?
    Most likely the easiest solution would be to pull out a
    AbstractVertex class, which holds this ugly static state, and
    make said state protected, instead of private, and have Vertex
    and LongDoubleFloatDoubleVertex both subclass it, and change
    reflection references from setClass(VERTEX_CLASS, vertexClass,
    Vertex.class); to setClass(VERTEX_CLASS, vertexClass,
    AbstractVertex.class);
    That sounds reasonable.  One thing we have thought about is
    compressing the vertices in memory at the cost of some CPU.  While
    this is orthogonal, they are both efforts to conserve memory.
Storing the edges as primitive arrays could be viewed as a fairlymaximally efficient compression, in cases where you haveprimitive-able types. I'm not sure how easy it would be to have agenerically compressible SortedMap<I, Edge<E,I>>, in a way whichallowed mutations of the map...

Still only in the thought phase, but I think the idea would be some kindof block compression of vertices (compress say 10k vertices at a timeand keep an index).

    Memory consumption (see above), these are aggregate members for
    all the vertices.
Ok, I'll see what it looks like if this data is moved to somethinglike a VertexState object attached to the GraphMapper, which all thevertexes can have a reference to.

As I've thought more about primitives vs objects, I think the objectflexibility is quite important. The page rank example could probablyget away with primitives, but other algorithms will likely requireobjects for edge values, message values, and vertex values (i.e. maybestoring the inlinks, or a bunch of different values i.e. multiplepersonalized page ranks run simultaneously). I guess you're thinkingthat Giraph will have two separate implementations? One that isprimitives based and the other that is object based?

Re: Primitives vs Objects (the Movie!)

Reply via email to