Hi Jerry,

DirichletClusters are not similar enough to ClusterBase to make that workable, so you are correct that the utility won't dump them. Writing a dump utility that can is a great idea, though it does tend to be rather Model specific. Maybe Models should have some printable representation a-la asFormatString().

Look at the code in

/MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
/MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java

for ideas on how you might be able to dump out your DirichletClusters and their Models.

I've actually considered making ClusterBase into a Model and generalizing DirichletCluster to be the root of all clusters. I think the distance measures used by canopy and k-means could be cast as Model pdfs but the whole idea is still only half-baked.

Jeff

Jerry Ye wrote:
I'm trying to view the output of my experiment using Dirichlet Process 
Clustering.  When attempting to use the ClusterDumper utility on the output 
directory, an exception is thrown.  Upon looking closer, DirichletCluster does 
not extend ClusterBase.  The error is below.

Is there some other way that I can view the cluster labels?

Thanks!

- jerry

-bash-3.1$ java -cp 
mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo 
dependency/*.jar . | sed 's/ /:/g') 
org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException: 
org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to 
org.apache.mahout.clustering.ClusterBase
    at 
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
    at 
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)


Reply via email to