Re: Standard Deviation of a Set of Vectors

Derek O'Callaghan Thu, 30 Sep 2010 09:36:56 -0700

Thanks for the tip, I had been generating the representative pointssequentially but was still using the MR versions of the clusteringalgorithms, I'll change that now.
:)

I just tried this, and there seems to be a difference in behaviourbetween the sequential and MR versions of Canopy. With MR:


   * Mapper called for each point, which calls
     canopyClusterer.addPointToCanopies(point.get(), canopies); - in my
     case 128 canopies are created
   * Reducer called with the canopy centroid points, which then calls
     canopyClusterer.addPointToCanopies(point, canopies); for each of
     these centroids - and I end up with 11 canopies.

And we end up with canopies of canopy centroids.

However, the sequential version doesn't appear to have the equivalent ofthe Reducer steps, which means that it contains the original number ofcanopies. Should it also compute the "canopies of canopies"? At themoment, the MR version is working much better for me with the secondcanopy generation step, so I'll stick with this for now. I guess itshould be consistent between sequential and MR? I should probably starta separate thread for this...


I guess I don't quite understand your question. Can you please elaborate?

Sorry, what I wanted to ask was: is it okay to useClusterEvaluator.intraClusterDensity()? Or should onlyClusterEvaluator.interClusterDensity() be used?

I have to leave for the evening, but if you need me to check anythingfurther here re: canopy I can take a look tomorrow.

Re: Standard Deviation of a Set of Vectors

Reply via email to