spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

meng Mon, 22 Feb 2016 23:55:26 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 6598590f9 -> ed3c1d700



[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call 
it in LDA without validating this requirement. So it might introduce errors. 
Replacing it by `Graph.apply` would be safer and more proper because it is a 
public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` 
here (though it doesn't seem so based on the implementation) or the test cases 
are special. jkbradley ankurdave

Author: Xiangrui Meng <[email protected]>

Closes #11226 from mengxr/SPARK-13355.

(cherry picked from commit 764ca18037b6b1884fbc4be9a011714a81495020)
Signed-off-by: Xiangrui Meng <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ed3c1d70
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ed3c1d70
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ed3c1d70

Branch: refs/heads/branch-1.4
Commit: ed3c1d70076f68eefb4690a0786b1c6950cb09b7
Parents: 6598590
Author: Xiangrui Meng <[email protected]>
Authored: Mon Feb 22 23:54:21 2016 -0800
Committer: Xiangrui Meng <[email protected]>
Committed: Mon Feb 22 23:54:52 2016 -0800

----------------------------------------------------------------------
 .../scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala    | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ed3c1d70/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
index 8e5154b..6a8b4ef 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
@@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.graphx._
-import org.apache.spark.graphx.impl.GraphImpl
 import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
 import org.apache.spark.mllib.linalg.{Matrices, SparseVector, DenseVector, 
Vector}
 import org.apache.spark.rdd.RDD
@@ -183,7 +182,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
       graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
         .mapValues(_._2)
     // Update the vertex descriptors with the new counts.
-    val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, 
graph.edges)
+    val newGraph = Graph(docTopicDistributions, graph.edges)
     graph = newGraph
     graphCheckpointer.updateGraph(newGraph)
     globalTopicTotals = computeGlobalTopicTotals()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

Reply via email to