[jira] [Updated] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function

Fan Jiang (JIRA) Wed, 28 Jan 2015 13:42:00 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Fan Jiang updated SPARK-4259:
-----------------------------
    Description: 
In recent years, spectral clustering has become one of the most popular modern 
clustering algorithms. It is simple to implement, can be solved efficiently by 
standard linear algebra software, and very often outperforms traditional 
clustering algorithms such as the k-means algorithm.

Power iteration clustering is a scalable and efficient algorithm for clustering 
points given pointwise mutual affinity values.  Internally the algorithm:

computes the Gaussian distance between all pairs of points and represents these 
distances in an Affinity Matrix
calculates a Normalized Affinity Matrix
calculates the principal eigenvalue and eigenvector
Clusters each of the input points according to their principal eigenvector 
component value

Details of this algorithm are found within [Power Iteration Clustering, Lin and 
Cohen]{www.icml2010.org/papers/387.pdf}


  was:
In recent years, spectral clustering has become one of the most popular modern 
clustering algorithms. It is simple to implement, can be solved efficiently by 
standard linear algebra software, and very often outperforms traditional 
clustering algorithms such as the k-means algorithm.

We implemented the unnormalized graph Laplacian matrix by Gaussian similarity 
function. A brief design looks like below:

Unnormalized spectral clustering

Input: raw data points, number k of clusters to construct: 

• Comupte Similarity matrix S ∈ Rn×n, .
• Construct a similarity graph. Let W be its weighted adjacency matrix.
• Compute the unnormalized Laplacian L = D - W. where D is the Degree diagonal 
matrix
• Compute the first k eigenvectors u1, . . . , uk of L.
• Let U ∈ Rn×k be the matrix containing the vectors u1, . . . , uk as columns.
• For i = 1, . . . , n, let yi ∈ Rk be the vector corresponding to the i-th row 
of U.
• Cluster the points (yi)i=1,...,n in Rk with the k-means algorithm into 
clusters C1, . . . , Ck.

Output: Clusters A1, . . . , Ak with Ai = { j | yj ∈ Ci }.



        Summary: Add Power Iteration Clustering Algorithm with Gaussian 
Similarity Function  (was: Add Spectral Clustering Algorithm with Gaussian 
Similarity Function)

> Add Power Iteration Clustering Algorithm with Gaussian Similarity Function
> --------------------------------------------------------------------------
>
>                 Key: SPARK-4259
>                 URL: https://issues.apache.org/jira/browse/SPARK-4259
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Fan Jiang
>            Assignee: Fan Jiang
>              Labels: features
>
> In recent years, spectral clustering has become one of the most popular 
> modern clustering algorithms. It is simple to implement, can be solved 
> efficiently by standard linear algebra software, and very often outperforms 
> traditional clustering algorithms such as the k-means algorithm.
> Power iteration clustering is a scalable and efficient algorithm for 
> clustering points given pointwise mutual affinity values.  Internally the 
> algorithm:
> computes the Gaussian distance between all pairs of points and represents 
> these distances in an Affinity Matrix
> calculates a Normalized Affinity Matrix
> calculates the principal eigenvalue and eigenvector
> Clusters each of the input points according to their principal eigenvector 
> component value
> Details of this algorithm are found within [Power Iteration Clustering, Lin 
> and Cohen]{www.icml2010.org/papers/387.pdf}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4259) Add Power Iteration Clustering Algorithm with Gaussian Similarity Function

Reply via email to