Marko A. Rodriguez created TINKERPOP-1163: ---------------------------------------------
Summary: GraphComputer's can have TraversalStrategies. Key: TINKERPOP-1163 URL: https://issues.apache.org/jira/browse/TINKERPOP-1163 Project: TinkerPop Issue Type: Improvement Components: hadoop, process Affects Versions: 3.1.0-incubating Reporter: Marko A. Rodriguez @dkuppitz makes the joke that he can count the number of vertices in the Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 minute. SparkGraphComputer with four blades takes ~5 minutes. What's the dealio? Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes traversals and does fast executions breaking away from the VertexProgram API and going strait to the native API of Spark. Check it: {code} g.V().count() -> inputRDD.count() {code} ...add a {{EmptyVertex.instance()}} manipulation to the respective InputFormats and you are just then skipping through bytes not manifesting objects at all. BAM. That would take 30 seconds on Friendster. {code} g.V().outE('knows').count() --> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count() {code} Blazing fast. ....for all those standard patterns, we just do a "native" execution for the respective GraphComputer engine. We sideStep object creation, iteration phases, views, map reduce jobs.... However, we have to be smart to update the {{Memory}} so it looks as if the real VertexProgram executed! --- {{iteration}}, {{runtime}}, {{~reducing}}, etc. Genius. -- This message was sent by Atlassian JIRA (v6.3.4#6332)