Marko A. Rodriguez created TINKERPOP3-1009:
----------------------------------------------
Summary: Add a CAUTION to documentation about HadoopGraph and
getting back elements
Key: TINKERPOP3-1009
URL: https://issues.apache.org/jira/browse/TINKERPOP3-1009
Project: TinkerPop 3
Issue Type: Improvement
Components: documentation, hadoop
Affects Versions: 3.1.0-incubating
Reporter: Marko A. Rodriguez
Fix For: 3.1.1-incubating
This works, but its crazy to do for large data over non-random access sources.
{code}
// g is a SparkGraphComputer traversal
gremlin> g.V().out().out()
==>v[3]
==>v[5]
gremlin>
{code}
Why is this crazy? Cause for each vertex, there is a {{graph.vertices(id)}}
lookup which, for HadoopGraph is a linear scan of the input format. This is
nutz for massive graphs.
{code}
gremlin> g.V().out().out().toList().get(0).getClass()
==>class org.apache.tinkerpop.gremlin.hadoop.structure.HadoopVertex
{code}
In our docs, we should state that you should use HadoopGraph to generate
reductions and not just swathes of vertices. Or, if you need a vertex, don't
get the vertex, get ONLY its ID.
{code}
gremlin> g.V().out().out().id()
==>3
==>5
{code}
Finally, note that in {{TraversalVertexProgram}} we have a configuration that
we never exposed to the user but we should via
{{gremlin.traversalVertexProgram.attachElements}}.
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ComputerResultStep.java#L56
https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ComputerResultStep.java#L87-L90
As we have it now {{attachElements}} is always {{TRUE}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)