Russell Alexander Spitzer created TINKERPOP-1271:
----------------------------------------------------

             Summary: SparkContext should be restarted if Killed and using 
Persistent Context
                 Key: TINKERPOP-1271
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
             Project: TinkerPop
          Issue Type: Bug
          Components: hadoop
    Affects Versions: 3.1.2-incubating, 3.2.0-incubating
            Reporter: Russell Alexander Spitzer


If the persisted Spark Context is killed by the user via the Spark UI or is 
terminated for some other error the Gremlin Console/Server is left with a 
stopped Spark Context. This could be caught and the spark context recreated. 
Oddly enough if you simply wait the context will "reset" itself or possible get 
GC'd out of the system and everything works again. 

##Repo
{code}
gremlin> g.V().count()
WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - 
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
==>6
gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend  
- Application has been killed. Reason: Master removed our application: KILLED
ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
10.150.0.180: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
// Driver has been killed here via the Master UI

gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g.V().count()
WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - 
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:

org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)

The currently active SparkContext was created at:

org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
{code}


Full trace from TP
{code}
        at 
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
        at 
org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
        at 
org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
        at 
org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
        at 
org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
        at 
org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
        at 
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
        at 
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{code}

If we wait a certain amount of time for some reason everything starts working 
again

{code}
ERROR org.apache.spark.rpc.netty.Inbox  - Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler: 
Master removed our application: KILLED
        at 
org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
        at 
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
        at 
org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
        at 
org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
        at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
WARN  org.apache.spark.rpc.netty.NettyRpcEnv  - Ignored message: true
WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection to 
rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection to 
rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
gremlin> g.V().count()
WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  - 
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
==>6
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to