[ 
https://issues.apache.org/jira/browse/MATH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17056577#comment-17056577
 ] 

Chen Tao edited comment on MATH-1521 at 3/11/20, 1:43 AM:
----------------------------------------------------------

{quote}Please provide a use-case.{quote}
These cluster evaluators use two clusters as scoring parameters:
* [Adjusted Rand 
index|https://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-index]
* [Mutual Information based 
scores|https://scikit-learn.org/stable/modules/clustering.html#mutual-information-based-scores]
* [Homogeneity, completeness and 
V-measure|https://scikit-learn.org/stable/modules/clustering.html#homogeneity-completeness-and-v-measure]
* [Fowlkes-Mallows 
scores|https://scikit-learn.org/stable/modules/clustering.html#fowlkes-mallows-scores]

When we want to evaluator the training parameters is as good as exists 
parameters, we need External Evaluators:

{code:java}
// Create a evaluator
ExternalClusterEvaluator evaluator = new AdjustedRandIndex();
// Create a k-means++ with unlimited maxIterations as a reference.
KMeansPlusPlusClusterer<DoublePoint> referClusterer = new 
KMeansPlusPlusCluster<>(5, -1, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> referClusters = 
referClusterer.cluster(points);

// UseCase1: Judge whether various of maxIterations value is as good as 
unlimited one
for (int i=100; i<1000; i=i+100) {
    KMeansPlusPlusClusterer<DoublePoint> clusterer = new 
KMeansPlusPlusCluster<>(5, i, distanceMeasure, rnd);
    List<CentroidCluster<DoublePoint>> clusters = 
referClusterer.cluster(points);
    double score = evaluator.score(referClusters, clusters);
    // Print score for each maxIterations value
    System.out.format("When maxIterations is %d, score is %f", i, score);
}

// UseCase2: Judge whether another Clusterer is as good as reference one.
MiniBatchKMeansClusterer<DoublePoint> miniBatchKMeans = new 
MiniBatchKMeansClusterer<>(5, ...);
List<CentroidCluster<DoublePoint>> miniBatchKMeansClusters = 
miniBatchKMeans.cluster(points);
double score = evaluator.score(referClusters, miniBatchKMeansClusters);
{code}



was (Author: chentao106):
{quote}Please provide a use-case.{quote}
These cluster evaluators use two clusters as scoring parameters:
[Adjusted Rand 
index|https://scikit-learn.org/stable/modules/clustering.html#adjusted-rand-index]
[Mutual Information based 
scores|https://scikit-learn.org/stable/modules/clustering.html#mutual-information-based-scores]
[Homogeneity, completeness and 
V-measure|https://scikit-learn.org/stable/modules/clustering.html#homogeneity-completeness-and-v-measure]
[Fowlkes-Mallows 
scores|https://scikit-learn.org/stable/modules/clustering.html#fowlkes-mallows-scores]

When we want to evaluator the training parameters is as good as exists 
parameters, we need External Evaluators:

{code:java}
// Create a evaluator
ExternalClusterEvaluator evaluator = new AdjustedRandIndex();
// Create a k-means++ with unlimited maxIterations as a reference.
KMeansPlusPlusClusterer<DoublePoint> referClusterer = new 
KMeansPlusPlusCluster<>(5, -1, distanceMeasure, rnd);
List<CentroidCluster<DoublePoint>> referClusters = 
referClusterer.cluster(points);

// UseCase1: Judge whether various of maxIterations value is as good as 
unlimited one
for (int i=100; i<1000; i=i+100) {
    KMeansPlusPlusClusterer<DoublePoint> clusterer = new 
KMeansPlusPlusCluster<>(5, i, distanceMeasure, rnd);
    List<CentroidCluster<DoublePoint>> clusters = 
referClusterer.cluster(points);
    double score = evaluator.score(referClusters, clusters);
    // Print score for each maxIterations value
    System.out.format("When maxIterations is %d, score is %f", i, score);
}

// UseCase2: Judge whether another Clusterer is as good as reference one.
MiniBatchKMeansClusterer<DoublePoint> miniBatchKMeans = new 
MiniBatchKMeansClusterer<>(5, ...);
List<CentroidCluster<DoublePoint>> miniBatchKMeansClusters = 
miniBatchKMeans.cluster(points);
double score = evaluator.score(referClusters, miniBatchKMeansClusters);
{code}


> A interface to implements various of clusters external measurers
> ----------------------------------------------------------------
>
>                 Key: MATH-1521
>                 URL: https://issues.apache.org/jira/browse/MATH-1521
>             Project: Commons Math
>          Issue Type: New Feature
>            Reporter: Chen Tao
>            Priority: Minor
>
> There are many clusters evaluation algorithm:
> [scikit-learn 
> clustering-performance-evaluation|https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation]
> They can be divided into 2 categories: “External Measurers” and "Internal 
> Measurers".
> The “External Measurers” evaluator clusters reference to another clusters.
> As opposed to “Internal Measurers”, the "External Measurers" may be:
> {code:java}
> public interface ClusterExternalEvaluator {
>     /**
>      * @param cList List of clusters.
>      * @return the score attributed by the evaluator.
>      */
>     <T extends Clusterable> double score(List<? extends Cluster<? extends T>> 
> clusters1, List<? extends Cluster<? extends T>> clusters2);
>     /**
>      * @param a Score computed by this evaluator.
>      * @param b Score computed by this evaluator.
>      * @return true if the evaluator considers score {@code a} is
>      * considered better than score {@code b}.
>      */
>     boolean isBetterScore(double a, double b);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to