Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Visualizing Sample Clusters 
(https://cwiki.apache.org/confluence/display/MAHOUT/Visualizing+Sample+Clusters)


Edited by Lance Norskog:
---------------------------------------------------------------------
h1. Introduction

Mahout provides examples to visualize sample clusters that gets created by 
various clustering algorithms like
* Canopy Clustering
* Dirichlet Process
* KMeans
* Fuzzy KMeans
* MeanShift Canopy
* Spectral KMeans

h1. Pre - Prep

For visualizing the clusters, you would just have to execute the Java classes 
under org.apache.mahout.clustering.display package in mahout-examples module. 
If you are using eclipse, setup mahout-examples as a project as specified in 
[Working with Maven in Eclipse|BuildingMahout#mahout_maven_eclipse].

h1. Visualizing clusters

The following classes in org.apache.mahout.clustering.display can be run 
without parameters to generate a sample data set and run the reference 
clustering implementations over them:
# DisplayClustering - generates 1000 samples from three, symmetric 
distributions. This is the same data set that is used by the following 
clustering programs. It displays the points on a screen and superimposes the 
model parameters that were used to generate the points. You can edit the 
generateSamples() method to change the sample points used by these programs.
# DisplayClustering - displays initial areas of generated points
# DisplayDirichlet - uses Dirichlet Process clustering
# DisplayCanopy - uses Canopy clustering
# DisplayKMeans - uses k-Means clustering
# DisplayFuzzyKMeans - uses Fuzzy k-Means clustering
# DisplayMeanShift - uses MeanShift clustering
# DisplaySpectralKMeans - uses Spectral KMeans via map-reduce algorithm
## Doesn't work yet: see 
[MAHOUT-524|https://issues.apache.org/jira/browse/MAHOUT-524]

If you are using Eclipse and have set it up as specified in Pre-Prep, just 
right-click on each of the classes mentioned above and choose "Run As - Java 
Application". To run these directly from the command line:
{code}
cd $MAHOUT_HOME/examples
mvn -q exec:java 
-Dexec.mainClass=org.apache.mahout.clustering.display.DisplayClustering
# substitute other names above for DisplayClustering
# at this writing, DisplaySpectralKMeans does not work
{code}

Note:
* Some of these programs display the sample points and then superimposes all of 
the clusters from each iteration. The last iteration's clusters are in bold red 
and the previous several are colored (orange, yellow, green, blue, magenta) in 
order after which all earlier clusters are in light grey. This helps to 
visualize how the clusters converge upon a solution over multiple iterations.

* By changing the parameter values (k, ALPHA_0, numIterations) and the display 
SIGNIFICANCE you can obtain different results.

h1. Screen Capture Animation
See [Sample Clusters Animation] for a screen caps of all the above programs, 
and an animated gif.


Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action    

Reply via email to