[ https://issues.apache.org/jira/browse/TINKERPOP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marko A. Rodriguez updated TINKERPOP-1163: ------------------------------------------ Fix Version/s: 3.2.0-incubating > GraphComputer's can have TraversalStrategies. > --------------------------------------------- > > Key: TINKERPOP-1163 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1163 > Project: TinkerPop > Issue Type: Improvement > Components: hadoop, process > Affects Versions: 3.1.0-incubating > Reporter: Marko A. Rodriguez > Assignee: Marko A. Rodriguez > Fix For: 3.2.0-incubating > > > @dkuppitz makes the joke that he can count the number of vertices in the > Friendster adjacency list with "awk to the sed to the bash to the.." in < 1 > minute. SparkGraphComputer with four blades takes ~5 minutes. > What's the dealio? > Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes > traversals and does fast executions breaking away from the VertexProgram API > and going strait to the native API of Spark. Check it: > {code} > g.V().count() -> inputRDD.count() > {code} > ...add a {{EmptyVertex.instance()}} manipulation to the respective > InputFormats and you are just then skipping through bytes not manifesting > objects at all. BAM. That would take 30 seconds on Friendster. > {code} > g.V().outE('knows').count() --> > inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count() > {code} > Blazing fast. > ....for all those standard patterns, we just do a "native" execution for the > respective GraphComputer engine. We sideStep object creation, iteration > phases, views, map reduce jobs.... However, we have to be smart to update the > {{Memory}} so it looks as if the real VertexProgram executed! --- > {{iteration}}, {{runtime}}, {{~reducing}}, etc. > Genius. -- This message was sent by Atlassian JIRA (v6.3.4#6332)