Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT) Page: Visualizing Sample Clusters (https://cwiki.apache.org/confluence/display/MAHOUT/Visualizing+Sample+Clusters)
Change Comment: --------------------------------------------------------------------- explains how to visualize the sample clusters generated by each of the available clustering algorithms Edited by Joe Prasanna Kumar: --------------------------------------------------------------------- h1. Introduction Mahout provides examples to visualize sample clusters that gets created by various clustering algorithms like * Canopy Clustering * Dirichlet * Kmeans * FuzzyKMeans * MeanShift h1. Pre - Prep For visualizing the clusters, you would just have to execute the Java classes under org.apache.mahout.clustering.display package in mahout-examples module. If you are using eclipse, setup mahout-examples as a project as specified in [Working with Maven in Eclipse|#BuildingMahout-WorkingWithMaveninEclipse]. h1. Visualizing clusters The following classes in org.apache.mahout.clustering.display can be run without parameters to generate a sample data set and run the reference clustering implementations over them: # DisplayClustering - generates 1000 samples from three, symmetric distributions. This is the same data set that is used by the following clustering programs. It displays the points on a screen and superimposes the model parameters that were used to generate the points. You can edit the generateSamples() method to change the sample points used by these programs. # DisplayDirichlet - uses Dirichlet Process clustering # DisplayCanopy - uses Canopy clustering # DisplayKMeans - uses k-Means clustering # DisplayFuzzyKMeans - uses Fuzzy k-Means clustering # DisplayMeanShift - uses MeanShift clustering If you are using Eclipse and have set it up as specified in Pre-Prep, just right-click on each of the classes mentioned above and choose "Run As - Java Application" Note: * Some of these programs display the sample points and then superimposes all of the clusters from each iteration. The last iteration's clusters are in bold red and the previous several are colored (orange, yellow, green, blue, magenta) in order after which all earlier clusters are in light grey. This helps to visualize how the clusters converge upon a solution over multiple iterations. * By changing the parameter values (k, ALPHA_0, numIterations) and the display SIGNIFICANCE you can obtain different results. Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action
