[
https://issues.apache.org/jira/browse/SPARK-15002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421147#comment-15421147
]
Sean Owen commented on SPARK-15002:
-----------------------------------
Do you see any errors? what more can you say about what's hung -- are any
processes active but not making progress? does a thread dump suggest a thread
is blocked? is something GC thrashing? is maybe a task just taking a very very
long time to progress? It would probably require more info to make any progress
here.
> Calling unpersist can cause spark to hang indefinitely when writing out a
> result
> --------------------------------------------------------------------------------
>
> Key: SPARK-15002
> URL: https://issues.apache.org/jira/browse/SPARK-15002
> Project: Spark
> Issue Type: Bug
> Components: GraphX, Spark Core
> Affects Versions: 1.5.2, 1.6.0, 2.0.0
> Environment: AWS and Linux VM, both in spark-shell and spark-submit.
> tested in 1.5.2 and 1.6. Tested in 2.0
> Reporter: Jamie Hutton
>
> The following code will cause spark to hang indefinitely. It happens when you
> have an unpersist which is followed by some futher processing of that data
> (in my case writing it out).
> I first experienced this issue with graphX so my example below involved
> graphX code, however i suspect it might be more of a core issue than a graphx
> one. I have raised another bug with similar results (indefinite hanging) but
> in different circumstances, so these may well be linked
> Code to reproduce (can be run in spark-shell):
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> import org.apache.spark.sql.types._
> import org.apache.spark.sql._
> val r = scala.util.Random
> val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL"))
> val distData = sc.parallelize(list)
> val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3))
> val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2,
> ("B")))}.distinct()
> val nodesRDD: RDD[(VertexId, (String))] = distinctNodes
> val graph = Graph(nodesRDD, edgesRDD)
> graph.persist()
> val ccGraph = graph.connectedComponents()
> ccGraph.cache
>
> val schema = StructType(Seq(StructField("id", LongType, false),
> StructField("netid", LongType, false)))
> val
> rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long])))
> val builtEntityDF = sqlContext.createDataFrame(rdd, schema)
>
> /*this unpersist step causes the issue*/
> ccGraph.unpersist()
>
> /*write step hangs for ever*/
> builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet")
>
> If you take out the ccGraph.unpersist() step the write step completes
> successfully
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]