On Jun 16, 2009, at 11:43 PM, Shashikant Kore wrote:

I had hacked the code to put labels for the vectors.

OK, so we've put a lot of this in place now with MAHOUT-65.

Then I modified
KMeans to output the document label, Cluster ID, and distance from the
cluster.

Do you think there is a way to make this generic for all of the clustering jobs? Seems like this would be handy to have in the new Utils module I'm working on for MAHOUT-126 (committing today)

Care to throw up a patch as a starting point like you did for MAHOUT-126?

Another utility takes this input and converts labels to the
actual text files from which it is created.   Then I do random checks
manually for the documents in a cluster.


OK, so ad hoc.  Definitely a reasonable thing to do at this point.

I wonder if we could hook into Carrot2 visualization tools at all. They have some really nice tools and perhaps we can output our stuff in a way that works for them. I imagine Weka does too. I suppose this all gets back to supporting more common input/output formats. Although, it seems the JSON (GSON) stuff is pretty powerful that way too.

-Grant

Reply via email to