Hi,
How we can find out which input points are included in a given cluster in result of StreamingKmeans !? This is needed to evaluate clustering result, so I think it should be considered to be improved. I know how to interpret kmenas result in mahout .7 with using namedVector class and one of dumpers (like clusterdumper). after clustering using kmeans driver, a directory named clusteredPoints has created which contains clustering result and using clusterDumper, you can see the created clusters and the points that are in each one. in below link there is a good solution for this : How to read Mahout clustering output<http://stackoverflow.com/questions/11848038/how-to-read-mahout-clustering-output> But, as I mentioned in title I want to have this capability to interpret Streaming Kmeans result which is a new feature in mahout .8. In this feature, it uses a Centroid class for holding data points and each cluster seeds. The generated result of StreamingKMeans algorithm is only a sequence file which is constructed of centroid vectors + keys and weights of each cluster. And in this output there is no information of input data points to know the distribution of them between clusters. However, it is not possible to me to get a sense of accuracy of clustering. by the way, How to get this information in clustering output ? does it implemented or is there any plan to implement this feature?
