[ https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284561#comment-14284561 ]
Xiangrui Meng commented on SPARK-4259: -------------------------------------- Note: [~javadba]'s update is from an offline discussion we had. The algorithm we plan to implement is described in the paper Power Iteration Clustering (PIC) (http://www.icml2010.org/papers/387.pdf) and the notation is adapted from there. > Add Spectral Clustering Algorithm with Gaussian Similarity Function > ------------------------------------------------------------------- > > Key: SPARK-4259 > URL: https://issues.apache.org/jira/browse/SPARK-4259 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Fan Jiang > Assignee: Fan Jiang > Labels: features > > In recent years, spectral clustering has become one of the most popular > modern clustering algorithms. It is simple to implement, can be solved > efficiently by standard linear algebra software, and very often outperforms > traditional clustering algorithms such as the k-means algorithm. > We implemented the unnormalized graph Laplacian matrix by Gaussian similarity > function. A brief design looks like below: > Unnormalized spectral clustering > Input: raw data points, number k of clusters to construct: > • Comupte Similarity matrix S ∈ Rn×n, . > • Construct a similarity graph. Let W be its weighted adjacency matrix. > • Compute the unnormalized Laplacian L = D - W. where D is the Degree > diagonal matrix > • Compute the first k eigenvectors u1, . . . , uk of L. > • Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns. > • For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th > row of U. > • Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into > clusters C1, . . . , Ck. > Output: Clusters A1, . . . , Ak with Ai = { j | yj ∈ Ci }. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org