Jamie Hutton created SPARK-15002:
------------------------------------

             Summary: Calling unpersist can cause spark to hang indefinitely 
when writing out a result
                 Key: SPARK-15002
                 URL: https://issues.apache.org/jira/browse/SPARK-15002
             Project: Spark
          Issue Type: Bug
          Components: GraphX, Spark Core
    Affects Versions: 1.6.0, 1.5.2
         Environment: AWS and Linux VM, both in spark-shell and spark-submit. 
tested in 1.5.2 and 1.6
            Reporter: Jamie Hutton


The following code will cause spark to hang indefinitely. It happens when you 
have an unpersist which is followed by some futher processing of that data (in 
my case writing it out).

I first experienced this issue with graphX so my example below involved graphX 
code, however i suspect it might be more of a core issue than a graphx one. I 
have raised another bug with similar results (indefinite hanging) but in 
different circumstances, so these may well be linked

Code to reproduce (can be run in spark-shell):

import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql._

val r = scala.util.Random
val list = (0L to 500L).map(i=>(i,r.nextInt(500).asInstanceOf[Long],"LABEL"))
val distData = sc.parallelize(list)
val edgesRDD = distData.map(x => Edge(x._1, x._2, x._3))
val distinctNodes = distData.flatMap{row => Iterable((row._1, ("A")),(row._2, 
("B")))}.distinct()
val nodesRDD: RDD[(VertexId, (String))] = distinctNodes

val graph = Graph(nodesRDD, edgesRDD)
graph.persist()

val ccGraph = graph.connectedComponents()
ccGraph.cache
   
val schema = StructType(Seq(StructField("id", LongType, false), 
StructField("netid", LongType, false)))
val 
rdd=ccGraph.vertices.map(row=>(Row(row._1.asInstanceOf[Long],row._2.asInstanceOf[Long])))
val builtEntityDF = sqlContext.createDataFrame(rdd, schema)
 
/*this unpersist step causes the issue*/
ccGraph.unpersist()
 
/*write step hangs for ever*/
builtEntityDF.write.format("parquet").mode("overwrite").save("/user/root/writetest.parquet")
 


If you take out the ccGraph.unpersist() step the write step completes 
successfully



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to