+1 That would certainly be useful too as it would save an unnecessary preprocessing step.
-----Original Message----- From: Grant Ingersoll [mailto:[email protected]] Sent: Thursday, July 14, 2011 5:29 AM To: [email protected] Subject: Cluster seeds was Re: Emitting distance from centroid for K-Means On Jul 13, 2011, at 6:42 PM, Jeff Eastman wrote: > > The assumption about input seeds originally came from using Canopy to prime > KMeans but it has become the prior set of clusters since the algorithms have > converged on common formats & models. Each iteration reads in the set of > clusters-n and outputs clusters-n+1, so changing this would have broad > impact. FuzzyK and Dirichlet use the same iteration semantics and the > ClusterIterator depends on this for unification with classification > interfaces. That makes total sense and I would want to keep that semantics. Basically, it's just an option to add in Vector as well to the check there. -Grant
