I agree, but this will require an API extension to Model, as I suggested below, because each model type has its own parameters that need to be represented. I'll open a Jira for it.

Jeff

Grant Ingersoll wrote:
We probably should have ClusterDumper still handle Dirichlet jobs, so that users don't need to deal w/ more than one interface.

On Jan 26, 2010, at 11:25 PM, Jeff Eastman wrote:

Hi Jerry,

DirichletClusters are not similar enough to ClusterBase to make that workable, 
so you are correct that the utility won't dump them. Writing a dump utility 
that can is a great idea, though it does tend to be rather Model specific. 
Maybe Models should have some printable representation a-la asFormatString().

Look at the code in

/MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
/MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java

for ideas on how you might be able to dump out your DirichletClusters and their 
Models.

I've actually considered making ClusterBase into a Model and generalizing 
DirichletCluster to be the root of all clusters. I think the distance measures 
used by canopy and k-means could be cast as Model pdfs but the whole idea is 
still only half-baked.

Jeff

Jerry Ye wrote:
I'm trying to view the output of my experiment using Dirichlet Process 
Clustering.  When attempting to use the ClusterDumper utility on the output 
directory, an exception is thrown.  Upon looking closer, DirichletCluster does 
not extend ClusterBase.  The error is below.

Is there some other way that I can view the cluster labels?

Thanks!

- jerry

-bash-3.1$ java -cp 
mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo 
dependency/*.jar . | sed 's/ /:/g') 
org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException: 
org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to 
org.apache.mahout.clustering.ClusterBase
   at 
org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
   at 
org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Reply via email to