[
https://issues.apache.org/jira/browse/SPARK-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041448#comment-14041448
]
Baoxu Shi edited comment on SPARK-2245 at 6/25/14 1:43 AM:
-----------------------------------------------------------
Hi [~ankurd], I changed my pull request. But there is another exception,
ShippableVertexPartition is not serializable. So I serialized it, but there is
another exception org.apache.spark.graphx.impl.RoutingTablePartition is not
serializable. Then I serialized it again, but on iteration 2 there will be an
exception: org.apache.spark.graphx.impl.ShippableVertexPartition cannot be cast
to scala.Tuple2
The code I'm using are:
val conf = new SparkConf().setAppName("HDTM")
.setMaster("local[4]")
val sc = new SparkContext(conf)
sc.setCheckpointDir("./checkpoint")
val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L),
Edge(2L, 0L, 2L)))
var g = Graph(v, e)
val vertexIds = Seq(0L, 1L, 2L)
var prevG: Graph[VertexId, Long] = null
for (i <- 1 to 2000) {
vertexIds.toStream.foreach(id => {
prevG = g
g = Graph(g.vertices, g.edges)
g.vertices.cache()
g.edges.cache()
prevG.unpersistVertices(blocking = false)
prevG.edges.unpersist(blocking = false)
})
g.vertices.checkpoint()
g.edges.checkpoint()
g.edges.count()
g.vertices.count()
println(s"${g.vertices.isCheckpointed} ${g.edges.isCheckpointed}")
println(" iter " + i + " finished")
}
println(g.vertices.collect().mkString(" "))
println(g.edges.collect().mkString(" "))
Am I on the right track? Or Should there be another way to change it?
was (Author: bxshi):
Just submit the changes, thanks!
> VertexRDD can not be materialized for checkpointing
> ---------------------------------------------------
>
> Key: SPARK-2245
> URL: https://issues.apache.org/jira/browse/SPARK-2245
> Project: Spark
> Issue Type: Bug
> Components: GraphX
> Reporter: Baoxu Shi
>
> Seems one can not materialize VertexRDD by simply calling count method, which
> is overridden by VertexRDD. But if you call RDD's count, it could materialize
> it.
> Is this a feature that designed to get the count without materialize
> VertexRDD? If so, do you guys think it is necessary to add a materialize
> method to VertexRDD?
> By the way, does count() is the cheapest way to materialize a RDD? Or it just
> cost the same resources like other actions?
> The pull request is here:
> https://github.com/apache/spark/pull/1177
> Best,
--
This message was sent by Atlassian JIRA
(v6.2#6252)