On 9/30/10 11:38 AM, Derek O'Callaghan wrote:
Thanks for the tip, I had been generating the representative points sequentially but was still using the MR versions of the clustering algorithms, I'll change that now.
:)

Regarding ClusterEvaluator, it seems to rely on RepresentativePointsDriver having been run already, as it loads these in the ClusterEvaluator(Configuration conf, Path clustersIn) constructor? I see another constructor ClusterEvaluator(Map<Integer, List<VectorWritable>> representativePoints, List<Cluster> clusters, DistanceMeasure measure) where you can specify these points, but this is marked as "test only". Is it okay to use this, passing in the cluster centres, or will it ultimately be removed?
Not planning to remove this as the unit tests require it and I won't remove them. If it is useful for you, go ahead. I will change the comment to "useful for testing"

I guess the question is, can ClusterEvaluator.intraClusterDensity() be used, given that it relies on a set of points, and not just the centre which is all that's required in interClusterDensity()? FYI I had to modify my local copy to ignore my "identical points" cluster as it was generating a NaN density.

I guess I don't quite understand your question. Can you please elaborate?

Reply via email to