Re: Help needed for identifying the Clustered words

Grant Ingersoll Wed, 18 Nov 2009 06:25:45 -0800

On Nov 18, 2009, at 1:51 AM, karthikeyan palanisamy wrote:

> Hi Mahout Team,
> 
>     Thank you for Mahout,and making it open source. I want to use the
> results of Mahout for a research application that I am working on.I am
> trying to look into and compare the results obtained* *k-means,Mean-Shift
> and Dirichlet* *algorithms.I see that the clustering drivers take
> sparse-vectors as their input and give Keyword-id---Cluster-id pair as their
> output. Please help me retrieve the actual words from the
> keyword-ids(Integers). Please brief me on how I can obtain the words
> corresponding to the Integers.



Part of it is going to depend on how you created the vectors.  If you created 
them from Lucene using the stuff in the utils module, then you should have a 
dictionary file that does the mapping.  If you created them on your own, you 
need to maintain the mapping.

FWIW, have a look at the ClusterDumper class in the utils submodule.   There is 
also the SequenceFileDumper and the VectorDumper which may come in handy.

-Grant

Re: Help needed for identifying the Clustered words

Reply via email to