[ https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253038#comment-15253038 ]
Marko A. Rodriguez commented on TINKERPOP-1271: ----------------------------------------------- I've seen this before too. Do you have a recommended solution? Perhaps PR :) ... if not, just some more direction, please. > SparkContext should be restarted if Killed and using Persistent Context > ----------------------------------------------------------------------- > > Key: TINKERPOP-1271 > URL: https://issues.apache.org/jira/browse/TINKERPOP-1271 > Project: TinkerPop > Issue Type: Bug > Components: hadoop > Affects Versions: 3.2.0-incubating, 3.1.2-incubating > Reporter: Russell Alexander Spitzer > > If the persisted Spark Context is killed by the user via the Spark UI or is > terminated for some other error the Gremlin Console/Server is left with a > stopped Spark Context. This could be caught and the spark context recreated. > Oddly enough if you simply wait the context will "reset" itself or possible > get GC'd out of the system and everything works again. > ##Repo > {code} > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > ==>6 > gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend > - Application has been killed. Reason: Master removed our application: KILLED > ERROR org.apache.spark.scheduler.TaskSchedulerImpl - Lost executor 0 on > 10.150.0.180: Remote RPC client disassociated. Likely due to containers > exceeding thresholds, or network issues. Check driver logs for WARN messages. > // Driver has been killed here via the Master UI > gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties') > ==>hadoopgraph[gryoinputformat->gryooutputformat] > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > java.lang.IllegalStateException: Cannot call methods on a stopped > SparkContext. > This stopped SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > The currently active SparkContext was created at: > org.apache.spark.SparkContext.getOrCreate(SparkContext.scala) > org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53) > org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60) > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122) > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > {code} > Full trace from TP > {code} > at > org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130) > at > org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:714) > at > org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129) > at > org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507) > at > org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42) > at > org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {code} > If we wait a certain amount of time for some reason everything starts working > again > {code} > ERROR org.apache.spark.rpc.netty.Inbox - Ignoring error > org.apache.spark.SparkException: Exiting due to error from cluster scheduler: > Master removed our application: KILLED > at > org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438) > at > org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124) > at > org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264) > at > org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > WARN org.apache.spark.rpc.netty.NettyRpcEnv - Ignored message: true > WARN org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connection > to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect... > WARN org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connection > to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect... > gremlin> g.V().count() > WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer > - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless > ==>6 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)