Repository: spark Updated Branches: refs/heads/branch-1.4 6598590f9 -> ed3c1d700
[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply `GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call it in LDA without validating this requirement. So it might introduce errors. Replacing it by `Graph.apply` would be safer and more proper because it is a public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` here (though it doesn't seem so based on the implementation) or the test cases are special. jkbradley ankurdave Author: Xiangrui Meng <[email protected]> Closes #11226 from mengxr/SPARK-13355. (cherry picked from commit 764ca18037b6b1884fbc4be9a011714a81495020) Signed-off-by: Xiangrui Meng <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed3c1d70 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed3c1d70 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed3c1d70 Branch: refs/heads/branch-1.4 Commit: ed3c1d70076f68eefb4690a0786b1c6950cb09b7 Parents: 6598590 Author: Xiangrui Meng <[email protected]> Authored: Mon Feb 22 23:54:21 2016 -0800 Committer: Xiangrui Meng <[email protected]> Committed: Mon Feb 22 23:54:52 2016 -0800 ---------------------------------------------------------------------- .../scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/ed3c1d70/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala index 8e5154b..6a8b4ef 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala @@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis} import org.apache.spark.annotation.DeveloperApi import org.apache.spark.graphx._ -import org.apache.spark.graphx.impl.GraphImpl import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer import org.apache.spark.mllib.linalg.{Matrices, SparseVector, DenseVector, Vector} import org.apache.spark.rdd.RDD @@ -183,7 +182,7 @@ final class EMLDAOptimizer extends LDAOptimizer { graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg) .mapValues(_._2) // Update the vertex descriptors with the new counts. - val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, graph.edges) + val newGraph = Graph(docTopicDistributions, graph.edges) graph = newGraph graphCheckpointer.updateGraph(newGraph) globalTopicTotals = computeGlobalTopicTotals() --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
