[ https://issues.apache.org/jira/browse/TINKERPOP-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15327452#comment-15327452 ]
Marko A. Rodriguez commented on TINKERPOP-1335: ----------------------------------------------- Note that I just confirmed via a test case with {{GryoInputFormat}} that the wrong answer is produced by {{SparkGraphComputer}}. This is good as now we can isolate this to TinkerPop solely and can test it in our test suite without the need for Spark Server infrastructure. > OLAP queries potentially fail for certain match()/select() query patterns > ------------------------------------------------------------------------- > > Key: TINKERPOP-1335 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1335 > Project: TinkerPop > Issue Type: Bug > Components: hadoop, process > Affects Versions: 3.2.0-incubating > Reporter: Daniel Kuppitz > Assignee: Marko A. Rodriguez > > There are certain queries that return wrong results when executed via > {{SparkGraphComputer}}. After testing a few queries I would say that the > problematic query pattern is a {{match()}} / {{select()}} combo. > For example (Grateful Dead graph): > {code} > gremlin> g.V().hasLabel("song").match( > __.as("a").values("name").as("name"), > __.as("a").values("performances").as("performances") > ).select("name","performances").count() > ==>0 > {code} > If {{count()}} is replaced by {{program()}}, the whole thing is going to > throw exceptions. However, if we select {{a}} instead of {{name}} and > {{performances}}, we get correct result. Likewise, if we remove the > {{select()}} or just rewrite the {{match()}} part, everything works as > expected. The simplest query to reproduce the erroneous behavior is this one: > {code} > g.V().match(__.as("a").values("name").as("name")).select("name").count() > {code} > The tests were done using a real Spark Server. I didn't try to use Spark in > local mode or Giraph. I did try {{TinkerGraphComputer}}, which worked fine. > Here's an actual stacktrace that shows were to find the root of all evil: > {noformat} > ERROR 2016-06-09 21:24:25,988 Logging.scala:95 - > org.apache.spark.executor.Executor: Exception in task 0.2 in stage 119.0 (TID > 307) > java.lang.IllegalStateException: The host of the object is unknown: > {a=v[{~label=Comment, member_id=2034, community_id=1676454656}], content=ok, > length=2}:java.util.LinkedHashMap > at > org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.getHostingVertex(WorkerExecutor.java:242) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.lambda$drainStep$262(WorkerExecutor.java:220) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor$$Lambda$113/1202183304.accept(Unknown > Source) ~[na:na] > at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[na:1.8.0_40] > at > org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.drainStep(WorkerExecutor.java:215) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.process.computer.traversal.WorkerExecutor.execute(WorkerExecutor.java:146) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.process.computer.traversal.TraversalVertexProgram.execute(TraversalVertexProgram.java:285) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$9(SparkExecutor.java:111) > ~[spark-gremlin-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor$$Lambda$92/910806192.apply(Unknown > Source) ~[na:na] > at > org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247) > ~[gremlin-core-3.2.1-20160601-aa673db1.jar:3.2.1-20160601-aa673db1] > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) > ~[scala-library-2.10.6.jar:na] > at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389) > ~[scala-library-2.10.6.jar:na] > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > ~[scala-library-2.10.6.jar:na] > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:189) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:64) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at org.apache.spark.scheduler.Task.run(Task.scala:89) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > ~[spark-core_2.10-1.6.1.2.jar:1.6.1.2] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)