[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

Marko A. Rodriguez (JIRA) Thu, 21 Apr 2016 17:07:00 -0700

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253038#comment-15253038
 ]


Marko A. Rodriguez commented on TINKERPOP-1271:
-----------------------------------------------

I've seen this before too. Do you have a recommended solution? Perhaps PR :) 
... if not, just some more direction, please.

> SparkContext should be restarted if Killed and using Persistent Context
> -----------------------------------------------------------------------
>
>                 Key: TINKERPOP-1271
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>            Reporter: Russell Alexander Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is 
> terminated for some other error the Gremlin Console/Server is left with a 
> stopped Spark Context. This could be caught and the spark context recreated. 
> Oddly enough if you simply wait the context will "reset" itself or possible 
> get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend 
>  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
> 10.150.0.180: Remote RPC client disassociated. Likely due to containers 
> exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped 
> SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
>       at 
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
>       at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
>       at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>       at 
> org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
>       at 
> org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
>       at 
> org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
>       at 
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> If we wait a certain amount of time for some reason everything starts working 
> again
> {code}
> ERROR org.apache.spark.rpc.netty.Inbox  - Ignoring error
> org.apache.spark.SparkException: Exiting due to error from cluster scheduler: 
> Master removed our application: KILLED
>       at 
> org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
>       at 
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
>       at 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
>       at 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
>       at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>       at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>       at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>       at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> WARN  org.apache.spark.rpc.netty.NettyRpcEnv  - Ignored message: true
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection 
> to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection 
> to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

Reply via email to