+1 That would certainly be useful too as it would save an unnecessary 
preprocessing step.

-----Original Message-----
From: Grant Ingersoll [mailto:[email protected]] 
Sent: Thursday, July 14, 2011 5:29 AM
To: [email protected]
Subject: Cluster seeds was Re: Emitting distance from centroid for K-Means


On Jul 13, 2011, at 6:42 PM, Jeff Eastman wrote:
> 
> The assumption about input seeds originally came from using Canopy to prime 
> KMeans but it has become the prior set of clusters since the algorithms have 
> converged on common formats & models. Each iteration reads in the set of 
> clusters-n and outputs clusters-n+1, so changing this would have broad 
> impact. FuzzyK and Dirichlet use the same iteration semantics and the 
> ClusterIterator depends on this for unification with classification 
> interfaces.

That makes total sense and I would want to keep that semantics.  Basically, 
it's just an option to add in Vector as well to the check there.

-Grant

Reply via email to