Re: Primitives vs Objects (the Movie!)

Avery Ching Wed, 07 Sep 2011 15:14:55 -0700

On 9/7/11 3:00 PM, Jake Mannix wrote:

On Wed, Sep 7, 2011 at 9:26 PM, Avery Ching <[email protected]<mailto:[email protected]>> wrote:
    Haha, this really is turning into a movie =).  I'll start warming
    up the popcorn.
Yeah, I've got my co-workers wondering if I'm going to ship any actualproduction code *inside* the company this week, at this rate (*shhhhh*Dmitriy don't tell!)

It's only Wednesday...=)

    On 9/7/11 12:51 PM, Jake Mannix wrote:
    Maybe a few more examples would help?  Cases where you want to do
    a BSP computation where the total sort (both the vertexes, and
    the edges for each vertex) is required, as is the random access
    nature of the Map?
    I think the range based examples are the ones that immediately
    come to mind.  BSP for graph processing is still pretty new, and I
    have no idea what kind of interesting algorithms will be tried out
    on this platform.  We are still exploring many possible algorithms
    to run.


Ok, cool.  I can see how wanting flexibility is important.

    I think the idea was that after returning the map, users could
    directly manipulate the map of edges or use the interfaces, there
    should probably be a removeEdge() too.  I'm starting to feel that
    we should remove Edge from the user perspective, just keep it
    internally only for the add edge requests.  It just makes things a
    little more complex to the user (too many ways to do the same
    thing).  Perhaps the interfaces you specified could hit most of
    the use cases (getTargetVertices(), getEdgeValue(), addEdge(), and
    removeEdge()).  If there turns out to be a big need, we can always
    change it back to a SortedMap or something else more appropriate.

getTargetVertices(), getEdgeValue(), addEdge(), and removeEdge()
sound like the right level of flexibility while keeping the dataencapsulated (so you can try your block compression idea, I can tryout primitives, etc, but the interface remains the same).
        Memory consumption (see above), these are aggregate members
        for all the vertices.


    Ok, I'll see what it looks like if this data is moved to
    something like a VertexState object attached to the GraphMapper,
    which all the vertexes can have a reference to.
    As I've thought more about primitives vs objects, I think the
    object flexibility is quite important.  The page rank example
    could probably get away with primitives, but other algorithms will
    likely require objects for edge values, message values, and vertex
    values (i.e. maybe storing the inlinks, or a bunch of different
values i.e. multiple personalized page ranks run simultaneously).I guess you're thinking that Giraph will have two separate
    implementations?  One that is primitives based and the other that
    is object based?
I'm thinking that with the right interface (like discussed above), youcan have the same base interface, but yeah, for the particular case ofimplementers of BaseVertex<I,V,E,M> where all of I,V, and E arewrappers of primitives, that there is some nice memory savings thatcan be done by keeping them primitive (and only instantiating objects/ autoboxing when accessing via the generic methods, when this istransiently done).
PageRank isn't the only example where I'd want to get the (suspected)perf boost of using primitives, as most cases where I've dealt withgraphs everything gets normalized at some point - the input featuresall get eventually turned into an edge "weight" of some kind, thevertexes themselves maybe keep some small data with them, but theedges just look like a target vertex id and an edge weight. Forexample, for social graphs, you can imagine lots and lots of fancydata you associate with users (geo, language, account freshness,recent text, topical interests, etc...), and lots of things toassociate with the edges (there are many ways users can interact,beyond the explicit "x is connected to y" way), but when you want torun some big monstrous computation, this data is condensed into somefixed final "connection strength" combination of weights. On theother hand, to actually *compute* that connection weight, maybenon-local information gleaned from a graph algorithm would be nice.Similarly, computing a nice big topic-sensitive pagerank mightrequire a bunch of topic-weight metadata at the vertexes.
I don't know why I'm arguing this - I agree with you, keeping the*ability* to do object stuff is important, yes. I'm not advocatingcompletely primitivizing all of the base implementations. I'm justsuggesting that it be added, as that's a pretty common use case whichcould benefit from some low-hanging fruit memory savings.

Sounds right to me, just wanted to make sure that I was understandingcorrectly what you wanted to do.

Re: Primitives vs Objects (the Movie!)

Reply via email to