On Nov 18, 2009, at 1:51 AM, karthikeyan palanisamy wrote: > Hi Mahout Team, > > Thank you for Mahout,and making it open source. I want to use the > results of Mahout for a research application that I am working on.I am > trying to look into and compare the results obtained* *k-means,Mean-Shift > and Dirichlet* *algorithms.I see that the clustering drivers take > sparse-vectors as their input and give Keyword-id---Cluster-id pair as their > output. Please help me retrieve the actual words from the > keyword-ids(Integers). Please brief me on how I can obtain the words > corresponding to the Integers.
Part of it is going to depend on how you created the vectors. If you created them from Lucene using the stuff in the utils module, then you should have a dictionary file that does the mapping. If you created them on your own, you need to maintain the mapping. FWIW, have a look at the ClusterDumper class in the utils submodule. There is also the SequenceFileDumper and the VectorDumper which may come in handy. -Grant
