[jira] [Comment Edited] (SPARK-2245) VertexRDD can not be materialized for checkpointing

Baoxu Shi (JIRA) Tue, 24 Jun 2014 18:45:08 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041448#comment-14041448
 ]


Baoxu Shi edited comment on SPARK-2245 at 6/25/14 1:43 AM:
-----------------------------------------------------------

Hi [~ankurd], I changed my pull request. But there is another exception, 
ShippableVertexPartition is not serializable. So I serialized it, but there is 
another exception org.apache.spark.graphx.impl.RoutingTablePartition is not 
serializable.  Then I serialized it again, but on iteration 2 there will be an 
exception: org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast 
to scala.Tuple2

The code I'm using are:

val conf = new SparkConf().setAppName("HDTM")
      .setMaster("local[4]")
    val sc = new SparkContext(conf)
    sc.setCheckpointDir("./checkpoint")
    val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
    val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), 
Edge(2L, 0L, 2L)))
    var g = Graph(v, e)

    val vertexIds = Seq(0L, 1L, 2L)
    var prevG: Graph[VertexId, Long] = null
    for (i <- 1 to 2000) {
      vertexIds.toStream.foreach(id => {
        prevG = g
        g = Graph(g.vertices, g.edges)

        g.vertices.cache()
        g.edges.cache()
        prevG.unpersistVertices(blocking = false)
        prevG.edges.unpersist(blocking = false)
      })

      g.vertices.checkpoint()
      g.edges.checkpoint()

      g.edges.count()
      g.vertices.count()
      println(s"${g.vertices.isCheckpointed} ${g.edges.isCheckpointed}")

      println(" iter " + i + " finished")
    }

    println(g.vertices.collect().mkString(" "))
    println(g.edges.collect().mkString(" "))

Am I on the right track? Or Should there be another way to change it?


was (Author: bxshi):
Just submit the changes, thanks!

> VertexRDD can not be materialized for checkpointing
> ---------------------------------------------------
>
>                 Key: SPARK-2245
>                 URL: https://issues.apache.org/jira/browse/SPARK-2245
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>            Reporter: Baoxu Shi
>
> Seems one can not materialize VertexRDD by simply calling count method, which 
> is overridden by VertexRDD. But if you call RDD's count, it could materialize 
> it.
> Is this a feature that designed to get the count without materialize 
> VertexRDD? If so, do you guys think it is necessary to add a materialize 
> method to VertexRDD?
> By the way, does count() is the cheapest way to materialize a RDD? Or it just 
> cost the same resources like other actions?
> The pull request is here:
> https://github.com/apache/spark/pull/1177
> Best,



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-2245) VertexRDD can not be materialized for checkpointing

Reply via email to