Hi Robin,
Great! I've got the refactoring changes for consolidating all the
various cluster types under a Cluster interface (formerly Printable but
now with id, numPoints and a center added). Dirichlet models still don't
yet have meaningful ids implemented but they all do (so far anyway) have
a notion of "numPoints" and a "center". I'm working on tests tomorrow to
make sure the ClusterDumper actually works with Dirichlet clusters then
I will commit that. Wednesday or Thursday most likely.
BTW, I changed my mind about foisting off the old Printable interface on
Vectors (but am still open to the idea if somebody actually working in
math thinks it is worth doing). All the new Clusters use the vector
formatting done in ClusterBase.
What I'd really like is feedback from ClusterDumper users on what is
working and what is needed to address MAHOUT-236. That includes you, right?
Jeff
PS: Ted, you expressed some doubts about the value of consolidating
Dirichlet clusters with the others. So far it seems to be a reasonable
fit but I'm doing the engineering on a tiny subset of simple models
without enough theoretical insight to see any pitfalls ahead. Is there a
"DistanceMeasure-like" discussion that might provide a firmer
underpinning for this work?
Robin Anil wrote:
No one yet. I am willing to help In case you need an extra pair of hands on
this one.
Robin