[jira] [Commented] (SPARK-15002) Calling unpersist can cause spark to hang indefinitely when writing out a result

Sean Owen (JIRA) Mon, 15 Aug 2016 09:33:31 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421234#comment-15421234
 ]


Sean Owen commented on SPARK-15002:
-----------------------------------

Yeah I just mean it ought to be fine and there's no obvious reason something 
would fail. If nothing is RUNNING ... can you search for "deadlock" ? the JVM 
can identify some deadlocks by itself. It's sometimes worth browsing any 
"waiting on lock" messages in the trace to see if something is suspicious. I'm 
wondering however how you can have 100% CPU but no running threads... are we 
talking about the same thing in both cases? an executor?

> Calling unpersist can cause spark to hang indefinitely when writing out a 
> result
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-15002
>                 URL: https://issues.apache.org/jira/browse/SPARK-15002
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX, Spark Core
>    Affects Versions: 1.5.2, 1.6.0, 2.0.0
>         Environment: AWS and Linux VM, both in spark-shell and spark-submit. 
> tested in 1.5.2 and 1.6. Tested in 2.0
>            Reporter: Jamie Hutton
>
> The following code will cause spark to hang indefinitely. It happens when you 
> have an unpersist which is followed by some futher processing of that data 
> (in my case writing it out).
> I first experienced this issue with graphX so my example below involved 
> graphX code, however i suspect it might be more of a core issue than a graphx 
> one. I have raised another bug with similar results (indefinite hanging) but 
> in different circumstances, so these may well be linked
> Code to reproduce (can be run in spark-shell):
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> import org.apache.spark.sql.types._
> import org.apache.spark.sql._
> val r = scala.util.Random
> val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL"))
> val distData = sc.parallelize(list)
> val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3))
> val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2, 
> ("B")))}.distinct()
> val nodesRDD: RDD[(VertexId, (String))] = distinctNodes
> val graph = Graph(nodesRDD, edgesRDD)
> graph.persist()
> val ccGraph = graph.connectedComponents()
> ccGraph.cache
>    
> val schema = StructType(Seq(StructField("id", LongType, false), 
> StructField("netid", LongType, false)))
> val 
> rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long])))
> val builtEntityDF = sqlContext.createDataFrame(rdd, schema)
>  
> /*this unpersist step causes the issue*/
> ccGraph.unpersist()
>  
> /*write step hangs for ever*/
> builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet")
>  
> If you take out the ccGraph.unpersist() step the write step completes 
> successfully



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15002) Calling unpersist can cause spark to hang indefinitely when writing out a result

Reply via email to