I've run dirichlet commandline and now have an output folder with some state-0, state-1, ... state-5 folders which each contain part-00000 and .part-00000.crc files. However the ClusteringYourData wiki page's Retrieving the Output section just says TODO. I don't know how to turn those part files into something useful.
http://cwiki.apache.org/MAHOUT/clusteringyourdata.html I successfully ran the org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job test which outputted data as text (to console at least) so I tried ripping the printResults() methods from that class and putting them in org.apache.mahout.clustering.dirichlet.DirichletJob but to no avail. Can someone help? Also, when running the commandline job it asks for the prototypeSize (-s param) so when I converted my Lucene index to a vector file the output said it created 11 vectors, but with i specified that value for prototypeSize the job failed saying it found 1793 vectors. Changing the value i specify to 1793 works but i now wonder why i need to specify it if it can figure it out? Could it not be optional?