[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

Marko A. Rodriguez (JIRA) Fri, 03 Feb 2017 07:53:14 -0800

    [ 
https://issues.apache.org/jira/browse/TINKERPOP-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851639#comment-15851639
 ]


Marko A. Rodriguez commented on TINKERPOP-1271:
-----------------------------------------------

That is ok. The first ERROR is from our test infrastructure. It should really 
be a WARN saying that the test is not registered as an "official" test suite.  
The latter two WARNS are expected -- the first says HADOOP_GREMLIN_LIBS 
(environmental variable used by users for putting jars into Hadoops distributed 
cache) is not set (not needed in testing). The second is related to testing 
what happens when an invalid HADOOP_GREMLIN_LIBS directory is provided. Look at 
the test name: {{shouldGracefullyHandleBadGremlinHadoopLibs/ }}.

All in all -- looks like your contribution is solid. Thank you.

> SparkContext should be restarted if Killed and using Persistent Context
> -----------------------------------------------------------------------
>
>                 Key: TINKERPOP-1271
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: hadoop
>    Affects Versions: 3.2.0-incubating, 3.1.2-incubating
>            Reporter: Russell Spitzer
>
> If the persisted Spark Context is killed by the user via the Spark UI or is 
> terminated for some other error the Gremlin Console/Server is left with a 
> stopped Spark Context. This could be caught and the spark context recreated. 
> Oddly enough if you simply wait the context will "reset" itself or possible 
> get GC'd out of the system and everything works again. 
> ##Repo
> {code}
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend 
>  - Application has been killed. Reason: Master removed our application: KILLED
> ERROR org.apache.spark.scheduler.TaskSchedulerImpl  - Lost executor 0 on 
> 10.150.0.180: Remote RPC client disassociated. Likely due to containers 
> exceeding thresholds, or network issues. Check driver logs for WARN messages.
> // Driver has been killed here via the Master UI
> gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
> ==>hadoopgraph[gryoinputformat->gryooutputformat]
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> java.lang.IllegalStateException: Cannot call methods on a stopped 
> SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
> org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
> org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
> {code}
> Full trace from TP
> {code}
>       at 
> org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
>       at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
>       at 
> org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>       at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
>       at 
> org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
>       at 
> org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
>       at 
> org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
>       at 
> org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
>       at 
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> {code}
> If we wait a certain amount of time for some reason everything starts working 
> again
> {code}
> ERROR org.apache.spark.rpc.netty.Inbox  - Ignoring error
> org.apache.spark.SparkException: Exiting due to error from cluster scheduler: 
> Master removed our application: KILLED
>       at 
> org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
>       at 
> org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
>       at 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
>       at 
> org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
>       at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>       at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>       at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>       at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> WARN  org.apache.spark.rpc.netty.NettyRpcEnv  - Ignored message: true
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection 
> to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> WARN  org.apache.spark.deploy.client.AppClient$ClientEndpoint  - Connection 
> to rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
> gremlin> g.V().count()
> WARN  org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer  
> - HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
> ==>6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TINKERPOP-1271) SparkContext should be restarted if Killed and using Persistent Context

Reply via email to