In reviewing clustering for upcoming training, I'm wondering about something w/ 
Canopy clustering that we claim, but wanted to check here first.  In the 
lectures, etc. I've seen on it, the idea is to run Canopy first and then some 
other more expensive algorithm, such as k-means, etc. with the idea that items 
further away than T2 are not even considered when scoring a centroid in the 
more complex clustering approach.  However, I think I'm missing where in the 
code this actually happens.  We do have code that allows K-Means to use the 
Canopy centroids as initial centroids for k-means, but the other material 
seemed to imply more aggressive pruning was possible since points outside of T2 
would not even need to be considered.  Otherwise, it doesn't seem like we are 
saving anything by doing Canopy first other than we likely have a better set of 
starting centroids.  I haven't thought about how this would be implemented.

Then again, it's late and I'm tired.

-Grant

Reply via email to