You will need to write a little cluster dumper utility like the two below which read from your cluster files and produce some appropriate output. Are you using one of the Mahout models or rolling your own? If you happen to be using the SampledNormalDistribution then DisplayOutputState should work out of the box. Otherwise, you may need to adjust it a bit to suit your purposes.

In TestL1ModelClustering, I also output the input sentences that were most likely to be assigned to each cluster; using the model's pdf() function to produce a partial ordering of the term vectors. It's all in memory so it wouldn't work for a large dataset but if you are just experimenting...

regards,
Jeff

Jerry Ye wrote:
Thanks for the confirmation Jeff.  How would one view cluster assignments right 
now given that the output is binary?



- jerry

________________________________
Jeff Eastman wrote:

Hi Jerry,

DirichletClusters are not similar enough to ClusterBase to make that
workable, so you are correct that the utility won't dump them. Writing a
dump utility that can is a great idea, though it does tend to be rather
Model specific. Maybe Models should have some printable representation
a-la asFormatString().

Look at the code in

 
/MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
 
/MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java

 for ideas on how you might be able to dump out your DirichletClusters
and their Models.

I've actually considered making ClusterBase into a Model and
generalizing DirichletCluster to be the root of all clusters. I think
the distance measures used by canopy and k-means could be cast as Model
pdfs but the whole idea is still only half-baked.

Jeff

Jerry Ye wrote:
I'm trying to view the output of my experiment using Dirichlet Process 
Clustering.  When attempting to use the ClusterDumper utility on the output 
directory, an exception is thrown.  Upon looking closer, DirichletCluster does 
not extend ClusterBase.  The error is below.

Is there some other way that I can view the cluster labels?

Thanks!

- jerry

-bash-3.1$ java -cp 
mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo 
dependency/*.jar . | sed 's/ /:/g') 
org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException: 
org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to 
org.apache.mahout.clustering.ClusterBase
    at 
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
    at 
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)




Reply via email to