spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

meng Mon, 22 Feb 2016 23:55:31 -0800

Repository: spark
Updated Branches:
  refs/heads/master 72427c3e1 -> 764ca1803



[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call 
it in LDA without validating this requirement. So it might introduce errors. 
Replacing it by `Graph.apply` would be safer and more proper because it is a 
public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` 
here (though it doesn't seem so based on the implementation) or the test cases 
are special. jkbradley ankurdave

Author: Xiangrui Meng <[email protected]>

Closes #11226 from mengxr/SPARK-13355.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/764ca180
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/764ca180
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/764ca180

Branch: refs/heads/master
Commit: 764ca18037b6b1884fbc4be9a011714a81495020
Parents: 72427c3
Author: Xiangrui Meng <[email protected]>
Authored: Mon Feb 22 23:54:21 2016 -0800
Committer: Xiangrui Meng <[email protected]>
Committed: Mon Feb 22 23:54:21 2016 -0800

----------------------------------------------------------------------
 .../scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala    | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/764ca180/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
index 7a41f74..7491ab0 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
@@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}
 
 import org.apache.spark.annotation.{DeveloperApi, Since}
 import org.apache.spark.graphx._
-import org.apache.spark.graphx.impl.GraphImpl
 import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
 import org.apache.spark.mllib.linalg.{DenseVector, Matrices, SparseVector, 
Vector, Vectors}
 import org.apache.spark.rdd.RDD
@@ -188,7 +187,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
       graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
         .mapValues(_._2)
     // Update the vertex descriptors with the new counts.
-    val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, 
graph.edges)
+    val newGraph = Graph(docTopicDistributions, graph.edges)
     graph = newGraph
     graphCheckpointer.update(newGraph)
     globalTopicTotals = computeGlobalTopicTotals()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

Reply via email to