Marko A. Rodriguez created TINKERPOP3-1009:
----------------------------------------------

             Summary: Add a CAUTION to documentation about HadoopGraph and 
getting back elements
                 Key: TINKERPOP3-1009
                 URL: https://issues.apache.org/jira/browse/TINKERPOP3-1009
             Project: TinkerPop 3
          Issue Type: Improvement
          Components: documentation, hadoop
    Affects Versions: 3.1.0-incubating
            Reporter: Marko A. Rodriguez
             Fix For: 3.1.1-incubating


This works, but its crazy to do for large data over non-random access sources.

{code}
// g is a SparkGraphComputer traversal
gremlin> g.V().out().out()
==>v[3]
==>v[5]
gremlin>
{code}

Why is this crazy? Cause for each vertex, there is a {{graph.vertices(id)}} 
lookup which, for HadoopGraph is a linear scan of the input format. This is 
nutz for massive graphs.

{code}
gremlin> g.V().out().out().toList().get(0).getClass()
==>class org.apache.tinkerpop.gremlin.hadoop.structure.HadoopVertex
{code}

In our docs, we should state that you should use HadoopGraph to generate 
reductions and not just swathes of vertices. Or, if you need a vertex, don't 
get the vertex, get ONLY its ID.

{code}
gremlin>  g.V().out().out().id()
==>3
==>5
{code}

Finally, note that in {{TraversalVertexProgram}} we have a configuration that 
we never exposed to the user but we should via 
{{gremlin.traversalVertexProgram.attachElements}}.

https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ComputerResultStep.java#L56

https://github.com/apache/incubator-tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/computer/traversal/step/map/ComputerResultStep.java#L87-L90

As we have it now {{attachElements}} is always {{TRUE}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to