spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

meng Mon, 22 Feb 2016 23:55:10 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 d31854da5 -> 0784e02fd



[SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

`GraphImpl.fromExistingRDDs` expects preprocessed vertex RDD as input. We call 
it in LDA without validating this requirement. So it might introduce errors. 
Replacing it by `Graph.apply` would be safer and more proper because it is a 
public API. The tests still pass. So maybe it is safe to use `fromExistingRDDs` 
here (though it doesn't seem so based on the implementation) or the test cases 
are special. jkbradley ankurdave

Author: Xiangrui Meng <[email protected]>

Closes #11226 from mengxr/SPARK-13355.

(cherry picked from commit 764ca18037b6b1884fbc4be9a011714a81495020)
Signed-off-by: Xiangrui Meng <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0784e02f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0784e02f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0784e02f

Branch: refs/heads/branch-1.6
Commit: 0784e02fd438e5fa2e6639d6bba114fa647dad23
Parents: d31854d
Author: Xiangrui Meng <[email protected]>
Authored: Mon Feb 22 23:54:21 2016 -0800
Committer: Xiangrui Meng <[email protected]>
Committed: Mon Feb 22 23:54:29 2016 -0800

----------------------------------------------------------------------
 .../scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala    | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/0784e02f/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
index 17c0609..4b06fad 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala
@@ -25,7 +25,6 @@ import breeze.stats.distributions.{Gamma, RandBasis}
 
 import org.apache.spark.annotation.{DeveloperApi, Since}
 import org.apache.spark.graphx._
-import org.apache.spark.graphx.impl.GraphImpl
 import org.apache.spark.mllib.impl.PeriodicGraphCheckpointer
 import org.apache.spark.mllib.linalg.{DenseVector, Matrices, SparseVector, 
Vector, Vectors}
 import org.apache.spark.rdd.RDD
@@ -186,7 +185,7 @@ final class EMLDAOptimizer extends LDAOptimizer {
       graph.aggregateMessages[(Boolean, TopicCounts)](sendMsg, mergeMsg)
         .mapValues(_._2)
     // Update the vertex descriptors with the new counts.
-    val newGraph = GraphImpl.fromExistingRDDs(docTopicDistributions, 
graph.edges)
+    val newGraph = Graph(docTopicDistributions, graph.edges)
     graph = newGraph
     graphCheckpointer.update(newGraph)
     globalTopicTotals = computeGlobalTopicTotals()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-13355][MLLIB] replace GraphImpl.fromExistingRDDs by Graph.apply

Reply via email to