[jira] [Commented] (TINKERPOP-1655) SparkGraphComputer returns vertices without properties

Matthew Stahl (JIRA) Wed, 22 Mar 2017 07:25:05 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936390#comment-15936390
 ]


Matthew Stahl commented on TINKERPOP-1655:
------------------------------------------

With that simple case, adding the strategy makes it work. However, with a more 
complex query and SparkGraphComputer, the resultant graph has no properties.

Create a simple tree:
{code}
// create a process-tree graph, and write out as json
// 
// root +-> r_c1 -> r_c1_c1
//      \-> r_c2
//
import org.apache.tinkerpop.gremlin.tinkergraph.structure._

val graph = TinkerGraph.open();
val root = graph.addVertex(T.label, "process", T.id, new java.lang.Long(1), 
"name", "root")
val r_c1 = graph.addVertex(T.label, "process", T.id, new java.lang.Long(2), 
"name", "r_c1")
val r_c2 = graph.addVertex(T.label, "process", T.id, new java.lang.Long(3), 
"name", "r_c2")
val r_c1_c1 = graph.addVertex(T.label, "process", T.id, new java.lang.Long(4), 
"name", "r_c1_c1")
r_c1.addEdge("childof", root, T.id, new java.lang.Long(5))
r_c2.addEdge("childof", root, T.id, new java.lang.Long(6))
r_c1_c1.addEdge("childof", r_c1, T.id, new java.lang.Long(7))

import org.apache.tinkerpop.gremlin.structure.io.graphson._

GraphSONIo.build.graph(graph).create.writeGraph("/tmp/process-tree.json")
{code}

Tinker computer works:
{code}
val res = graph.traversal().V()
                .has("name", "r_c1_c1")
                .repeat(out("childof"))
                .until(has("name", "root"))
                .emit(has("name", "root"))
                .tree()
                .next()
//Pretty-print the results
res.foreach(r => {
    // println(r)
    val child = r._1.asInstanceOf[Vertex]
    val parent = r._2.getLeafObjects.get(0).asInstanceOf[Vertex]
    try {
        printf("( %s, %s ) childof ( %s, %s )\n", child.id(), 
child.value("name"), parent.id(), parent.value("name"))
    } catch {
        case e:Exception => {
            printf("child: {id = %s, keys = %s}, parent: {id = %s, keys = 
%s}\n", child.id(), child.keys(), parent.id(), parent.keys())
        }
    }
}) 

// output:
// res: org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree[_] = 
{v[4]={v[2]={v[1]={}}}}
// ( 4, r_c1_c1 ) childof ( 1, root )
{code}

However, with the SparkGraphComputer, the resultant graph has no properties.
{code}
val res = 
graph.traversal().withComputer(computer).withStrategies(org.apache.tinkerpop.gremlin.process.traversal.strategy.decoration.HaltedTraverserStrategy.detached()).V()
                .has("name", "r_c1_c1")
                .repeat(out("childof"))
                .until(has("name", "root"))
                .emit(has("name", "root"))
                .tree()
                .next()
//Pretty-print the results
res.foreach(r => {
    // println(r)
    val child = r._1.asInstanceOf[Vertex]
    val parent = r._2.getLeafObjects.get(0).asInstanceOf[Vertex]
    try {
        printf("( %s, %s ) childof ( %s, %s )\n", child.id(), 
child.value("name"), parent.id(), parent.value("name"))
    } catch {
        case e:Exception => {
            println(s"Caught exception: $e")
            printf("child: {id = %s, keys = %s}, parent: {id = %s, keys = 
%s}\n", child.id(), child.keys(), parent.id(), parent.keys())
        }
    }
})

// Fails:
// res: org.apache.tinkerpop.gremlin.process.traversal.step.util.Tree[_] = 
{v[4]={v[2]={v[1]={}}}}
// Caught exception: java.lang.IllegalStateException: The property does not 
exist as the key has no associated value for the provided element: v[4]:name
// child: {id = 4, keys = []}, parent: {id = 1, keys = []}
{code}


> SparkGraphComputer returns vertices without properties
> ------------------------------------------------------
>
>                 Key: TINKERPOP-1655
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1655
>             Project: TinkerPop
>          Issue Type: Bug
>    Affects Versions: 3.3.0
>         Environment: /usr/lib/spark/jars/spark-core_2.11-2.0.2.jar
>            Reporter: Matthew Stahl
>
> Spark 2.0 + tinkerpop-3.3.0
> Simple program which pulls out the 1st vertex in the grateful-dead.kryo 
> dataset and prints the property keys works with the standard computer, but 
> when processed using the SparkGraphComputer, the set of keys is empty.
> {code}
> // pre-requisite:
>     // sudo -u zeppelin hadoop fs -copyFromLocal /tmp/grateful-dead.kryo 
> grateful-dead.kryo
>     
>     val inputHdfsLocation = "grateful-dead.kryo"
>     val props = Map[String, String](
>           "gremlin.graph" -> 
> "org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph"
>         , "gremlin.hadoop.graphReader" -> 
> "org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat"
>         , "gremlin.hadoop.inputLocation" -> inputHdfsLocation
>         , "gremlin.hadoop.outputLocation" -> "output"
>         , "gremlin.hadoop.jarsInDistributedCache" -> "true"
>         , "spark.master" -> "local[1]"
>         , "spark.executor.memory" -> "1g"
>         , "spark.serializer" -> 
> "org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer"
>         // , "spark.kryo.registrator" -> 
> "org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator"
>     )
>     
>     import org.apache.commons.configuration._
>     
>     val conf = new BaseConfiguration()
>     props.foreach( kv => conf.addProperty(kv._1, kv._2))
>     
>     import org.apache.tinkerpop.gremlin.process.computer._
>     import org.apache.tinkerpop.gremlin.spark.process.computer._
>     import org.apache.tinkerpop.gremlin.structure.util._
>     val graph = GraphFactory.open(conf)
>     val v = graph.traversal().V().next(1).get(0)
>     printf("vertex id = %s, keys = %s\n", v.id, v.keys())
>     
>     val computer = Computer.compute(classOf[SparkGraphComputer])
>     val v2 = graph.traversal().withComputer(computer).V().next(1).get(0)
>     printf("vertex id = %s, keys = %s\n", v2.id, v2.keys())
> {code}
> Above produces:
> {code}
> inputHdfsLocation: String = grateful-dead.kryo
> props: scala.collection.immutable.Map[String,String] = Map(spark.serializer 
> -> org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer, 
> gremlin.hadoop.inputLocation -> grateful-dead.kryo, 
> gremlin.hadoop.jarsInDistributedCache -> true, gremlin.hadoop.graphReader -> 
> org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat, 
> gremlin.graph -> org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph, 
> gremlin.hadoop.outputLocation -> output, spark.master -> local[1], 
> spark.executor.memory -> 1g)
> import org.apache.commons.configuration._
> conf: org.apache.commons.configuration.BaseConfiguration = 
> org.apache.commons.configuration.BaseConfiguration@1849d0b7
> import org.apache.tinkerpop.gremlin.process.computer._
> import org.apache.tinkerpop.gremlin.spark.process.computer._
> import org.apache.tinkerpop.gremlin.structure.util._
> graph: org.apache.tinkerpop.gremlin.structure.Graph = 
> hadoopgraph[gryoinputformat->no-writer]
> v: org.apache.tinkerpop.gremlin.structure.Vertex = v[1]
> vertex id = 1, keys = [name, songType, performances]
> computer: org.apache.tinkerpop.gremlin.process.computer.Computer = 
> sparkgraphcomputer
> v2: org.apache.tinkerpop.gremlin.structure.Vertex = v[1]
> vertex id = 1, keys = []
> {code}
> Notice the empty set of keys when run w/ the SparkGraphComputer, but the 
> correct set when using the standard computer



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TINKERPOP-1655) SparkGraphComputer returns vertices without properties

Reply via email to