[
https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng updated SPARK-4259:
---------------------------------
Shepherd: Xiangrui Meng
> Add Spectral Clustering Algorithm with Gaussian Similarity Function
> -------------------------------------------------------------------
>
> Key: SPARK-4259
> URL: https://issues.apache.org/jira/browse/SPARK-4259
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Fan Jiang
> Assignee: Fan Jiang
> Labels: features
>
> In recent years, spectral clustering has become one of the most popular
> modern clustering algorithms. It is simple to implement, can be solved
> efficiently by standard linear algebra software, and very often outperforms
> traditional clustering algorithms such as the k-means algorithm.
> We implemented the unnormalized graph Laplacian matrix by Gaussian similarity
> function. A brief design looks like below:
> Unnormalized spectral clustering
> Input: raw data points, number k of clusters to construct:
> • Comupte Similarity matrix S ∈ Rn×n, .
> • Construct a similarity graph. Let W be its weighted adjacency matrix.
> • Compute the unnormalized Laplacian L = D - W. where D is the Degree
> diagonal matrix
> • Compute the first k eigenvectors u1, . . . , uk of L.
> • Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns.
> • For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th
> row of U.
> • Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into
> clusters C1, . . . , Ck.
> Output: Clusters A1, . . . , Ak with Ai = { j | yj ∈ Ci }.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]