On 9/30/10 11:38 AM, Derek O'Callaghan wrote:
Thanks for the tip, I had been generating the representative points
sequentially but was still using the MR versions of the clustering
algorithms, I'll change that now.
:)
Regarding ClusterEvaluator, it seems to rely on
RepresentativePointsDriver having been run already, as it loads these
in the ClusterEvaluator(Configuration conf, Path clustersIn)
constructor? I see another constructor ClusterEvaluator(Map<Integer,
List<VectorWritable>> representativePoints, List<Cluster> clusters,
DistanceMeasure measure) where you can specify these points, but this
is marked as "test only". Is it okay to use this, passing in the
cluster centres, or will it ultimately be removed?
Not planning to remove this as the unit tests require it and I won't
remove them. If it is useful for you, go ahead. I will change the
comment to "useful for testing"
I guess the question is, can ClusterEvaluator.intraClusterDensity() be
used, given that it relies on a set of points, and not just the centre
which is all that's required in interClusterDensity()? FYI I had to
modify my local copy to ignore my "identical points" cluster as it was
generating a NaN density.
I guess I don't quite understand your question. Can you please elaborate?