What do your input vectors look like? How many canopies did you get in clusters-0?
-----Original Message----- From: eric skinner [mailto:[email protected]] Sent: Wednesday, August 10, 2011 8:33 AM To: [email protected] Subject: issues on Mahout clustering result using K-means I ran the K-means clustering algorithm against a set of sequence files. However, the generated result looks like this: 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] 0 belongs to cluster 1.0: [] Would you like to let me know why I get this type of result? Is that because of any specific parameter setting requirement or anything else? The program I use is borrowed from NewsKMeansClustering.java, an example given in chapter 9 of Mahout-in-Action. The core clustering code in this program is CanopyDriver.run(vectorsFolder, canopyCentroids, new EuclideanDistanceMeasure(), 250, 120, false, false); KMeansDriver.run(conf, vectorsFolder, new Path(canopyCentroids, "clusters-0"), clusterOutput, new TanimotoDistanceMeasure(), 0.01, 20, true, false);
