Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Spectral Clustering 
(https://cwiki.apache.org/confluence/display/MAHOUT/Spectral+Clustering)


Edited by Shannon Quinn:
---------------------------------------------------------------------
Spectral clustering is a more powerful and specialized algorithm (compared to 
K-means) which has significant use in photo editing, hence its name. Each 
object to be clustered can initially be represented as an _n_\-dimensional 
numeric vector, but the difference with this algorithm is that there must also 
be some method for performing a comparison between each object and expressing 
this comparison as a scalar.

This _n_ by _n_ comparison of all objects with all others forms the _affinity_ 
matrix, which can be intuitively thought of as a rough representation of an 
underlying undirected, weighted, and fully-connected graph whose edges express 
the relative relationships, or affinities, between each pair of objects in the 
original data. This affinity matrix forms the basis from which the two spectral 
clustering algorithms operate.

The equation by which the affinities are calculated can vary depending on the 
user's circumstances; typically, the equation takes the form of:

exp( _d{_}{^}2^ / _c_ )

where _d_ is the Euclidean distance between a pair of points, and _c_ is a 
scaling factor. _c_ is often calculated relative to a _k_\-neighborhood of 
closest points to the current point; all other affinities are set to 0 outside 
of the neighborhood. Again, this formula can vary depending on the situation 
(e.g. a fully-connected graph would ignore the _k_\-neighborhood and calculate 
affinities for all pairs of points).

[Full overview on spectral 
clustering|http://spectrallyclustered.wordpress.com/2010/05/27/intro-and-spectral-clustering-101/]

h2. K-Means Spectral Clustering

h3. Overview

This consists of a few basic steps of generalized spectral clustering, followed 
by standard k-means clustering over the intermediate results. Again, this 
process begins with an affinity matrix *A* - whether or not it is 
fully-connected depends on the user's need.

*A* is then transformed into a pseudo-Laplacian matrix via a multiplication 
with a diagonal matrix whose entries consist of the sums of the rows of *A*. 
The sums are modified to be the inverse square root of their original values. 
The final operation looks something like:

L = D^{-1/2} A D^{-1/2}

*L* has some properties that are of interest to us; most importantly, while it 
is symmetric like *A*, it has a more stable eigen-decomposition. *L* is 
decomposed into its constituent eigenvectors and corresponding eigenvalues 
(though the latter will not be needed for future calculations); the matrix of 
eigenvectors, *U*, is what we are now interested in.

Assuming *U* is a column matrix (the eigenvectors comprise the columns), then 
we will now use the _rows_ of *U* as proxy data for the original data points. 
We will run each row through standard K-means clustering, and the label that 
each proxy point receives will be transparently assigned to the corresponding 
original data point, resulting in the final clustering assignments.

[Full overview on k-means spectral 
clustering|http://spectrallyclustered.wordpress.com/2010/06/05/sprint-1-k-means-spectral-clustering/]

h3. Implementation

h2. Eigencuts Spectral Clustering

h3. Overview

[Full overview on Eigencuts spectral 
clustering|http://spectrallyclustered.wordpress.com/2010/07/06/sprint-3-introduction-to-eigencuts/]

h3. Implementation

h2. Quickstart

h2. Examples

Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action    

Reply via email to