[
https://issues.apache.org/jira/browse/MATH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056577#comment-17056577
]
Chen Tao edited comment on MATH-1521 at 3/11/20, 1:43 AM:
----------------------------------------------------------
{quote}Please provide a use-case.{quote}
These cluster evaluators use two clusters as scoring parameters:
* [Adjusted Rand
index|https://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-index]
* [Mutual Information based
scores|https://scikit-learn.org/stable/modules/clustering.html#mutual-information-based-scores]
* [Homogeneity, completeness and
V-measure|https://scikit-learn.org/stable/modules/clustering.html#homogeneity-completeness-and-v-measure]
* [Fowlkes-Mallows
scores|https://scikit-learn.org/stable/modules/clustering.html#fowlkes-mallows-scores]
When we want to evaluator the training parameters is as good as exists
parameters, we need External Evaluators:
{code:java}
// Create a evaluator
ExternalClusterEvaluator evaluator = new AdjustedRandIndex();
// Create a k-means++ with unlimited maxIterations as a reference.
KMeansPlusPlusClusterer<DoublePoint> referClusterer = new
KMeansPlusPlusCluster<>(5, -1, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> referClusters =
referClusterer.cluster(points);
// UseCase1: Judge whether various of maxIterations value is as good as
unlimited one
for (int i=100; i<1000; i=i+100) {
KMeansPlusPlusClusterer<DoublePoint> clusterer = new
KMeansPlusPlusCluster<>(5, i, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> clusters =
referClusterer.cluster(points);
double score = evaluator.score(referClusters, clusters);
// Print score for each maxIterations value
System.out.format("When maxIterations is %d, score is %f", i, score);
}
// UseCase2: Judge whether another Clusterer is as good as reference one.
MiniBatchKMeansClusterer<DoublePoint> miniBatchKMeans = new
MiniBatchKMeansClusterer<>(5, ...);
List<CentroidCluster<DoublePoint>> miniBatchKMeansClusters =
miniBatchKMeans.cluster(points);
double score = evaluator.score(referClusters, miniBatchKMeansClusters);
{code}
was (Author: chentao106):
{quote}Please provide a use-case.{quote}
These cluster evaluators use two clusters as scoring parameters:
[Adjusted Rand
index|https://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-index]
[Mutual Information based
scores|https://scikit-learn.org/stable/modules/clustering.html#mutual-information-based-scores]
[Homogeneity, completeness and
V-measure|https://scikit-learn.org/stable/modules/clustering.html#homogeneity-completeness-and-v-measure]
[Fowlkes-Mallows
scores|https://scikit-learn.org/stable/modules/clustering.html#fowlkes-mallows-scores]
When we want to evaluator the training parameters is as good as exists
parameters, we need External Evaluators:
{code:java}
// Create a evaluator
ExternalClusterEvaluator evaluator = new AdjustedRandIndex();
// Create a k-means++ with unlimited maxIterations as a reference.
KMeansPlusPlusClusterer<DoublePoint> referClusterer = new
KMeansPlusPlusCluster<>(5, -1, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> referClusters =
referClusterer.cluster(points);
// UseCase1: Judge whether various of maxIterations value is as good as
unlimited one
for (int i=100; i<1000; i=i+100) {
KMeansPlusPlusClusterer<DoublePoint> clusterer = new
KMeansPlusPlusCluster<>(5, i, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> clusters =
referClusterer.cluster(points);
double score = evaluator.score(referClusters, clusters);
// Print score for each maxIterations value
System.out.format("When maxIterations is %d, score is %f", i, score);
}
// UseCase2: Judge whether another Clusterer is as good as reference one.
MiniBatchKMeansClusterer<DoublePoint> miniBatchKMeans = new
MiniBatchKMeansClusterer<>(5, ...);
List<CentroidCluster<DoublePoint>> miniBatchKMeansClusters =
miniBatchKMeans.cluster(points);
double score = evaluator.score(referClusters, miniBatchKMeansClusters);
{code}
> A interface to implements various of clusters external measurers
> ----------------------------------------------------------------
>
> Key: MATH-1521
> URL: https://issues.apache.org/jira/browse/MATH-1521
> Project: Commons Math
> Issue Type: New Feature
> Reporter: Chen Tao
> Priority: Minor
>
> There are many clusters evaluation algorithm:
> [scikit-learn
> clustering-performance-evaluation|https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation]
> They can be divided into 2 categories: “External Measurers” and "Internal
> Measurers".
> The “External Measurers” evaluator clusters reference to another clusters.
> As opposed to “Internal Measurers”, the "External Measurers" may be:
> {code:java}
> public interface ClusterExternalEvaluator {
> /**
> * @param cList List of clusters.
> * @return the score attributed by the evaluator.
> */
> <T extends Clusterable> double score(List<? extends Cluster<? extends T>>
> clusters1, List<? extends Cluster<? extends T>> clusters2);
> /**
> * @param a Score computed by this evaluator.
> * @param b Score computed by this evaluator.
> * @return true if the evaluator considers score {@code a} is
> * considered better than score {@code b}.
> */
> boolean isBetterScore(double a, double b);
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)