Hi Robin,

Great! I've got the refactoring changes for consolidating all the various cluster types under a Cluster interface (formerly Printable but now with id, numPoints and a center added). Dirichlet models still don't yet have meaningful ids implemented but they all do (so far anyway) have a notion of "numPoints" and a "center". I'm working on tests tomorrow to make sure the ClusterDumper actually works with Dirichlet clusters then I will commit that. Wednesday or Thursday most likely.

BTW, I changed my mind about foisting off the old Printable interface on Vectors (but am still open to the idea if somebody actually working in math thinks it is worth doing). All the new Clusters use the vector formatting done in ClusterBase.

What I'd really like is feedback from ClusterDumper users on what is working and what is needed to address MAHOUT-236. That includes you, right?

Jeff

PS: Ted, you expressed some doubts about the value of consolidating Dirichlet clusters with the others. So far it seems to be a reasonable fit but I'm doing the engineering on a tiny subset of simple models without enough theoretical insight to see any pitfalls ahead. Is there a "DistanceMeasure-like" discussion that might provide a firmer underpinning for this work?



Robin Anil wrote:
No one yet. I am willing to help In case you need an extra pair of hands on
this one.

Robin

Reply via email to