Marko A. Rodriguez created TINKERPOP-1163:
---------------------------------------------

             Summary: GraphComputer's can have TraversalStrategies.
                 Key: TINKERPOP-1163
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1163
             Project: TinkerPop
          Issue Type: Improvement
          Components: hadoop, process
    Affects Versions: 3.1.0-incubating
            Reporter: Marko A. Rodriguez


@dkuppitz makes the joke that he can count the number of vertices in the 
Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 
minute. SparkGraphComputer with four blades takes ~5 minutes.

What's the dealio?

Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes 
traversals and does fast executions breaking away from the VertexProgram API 
and going strait to the native API of Spark. Check it:

{code}
g.V().count() -> inputRDD.count()
{code}

...add a {{EmptyVertex.instance()}} manipulation to the respective InputFormats 
and you are just then skipping through bytes not manifesting objects at all. 
BAM. That would take 30 seconds on Friendster.

{code}
g.V().outE('knows').count() --> 
inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
{code}

Blazing fast.

....for all those standard patterns, we just do a "native" execution for the 
respective GraphComputer engine. We sideStep object creation, iteration phases, 
views, map reduce jobs.... However, we have to be smart to update the 
{{Memory}} so it looks as if the real VertexProgram executed! --- 
{{iteration}}, {{runtime}}, {{~reducing}}, etc.

Genius.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to