Hello.
2020-03-06 9:48 UTC+01:00, [email protected] <[email protected]>:
> Hi,
> For machine learning centroid cluster algorithm, we often use is
> Calinsk-iHarabasz score to evaluate which algorithm or how many centers is
> best for a dataset.
> The python lib sklearn implements Calinsk-iHarabasz as
> sklearn.metrics.calinski_harabasz_score.
Could you post a reference (most of our documentation points
to "Wikipedia" or "MathWorld")?
> I think there should be a CalinskiHarabaszClusterEvaluator in commons math:
At first sight, the approach would be to define a functional
interface (with the "score" method).
Then an "enum" that would be a factory of evaluators, along
the lines of what has been done in "Commons RNG" (see class
"RandomSource"[1]).
> ```java
> package org.apache.commons.math4.ml.clustering.evaluation;
>
> import org.apache.commons.math4.ml.clustering.Cluster;
> import org.apache.commons.math4.ml.clustering.Clusterable;
>
> import java.util.List;
>
> public class CalinskiHarabaszClusterEvaluator<T extends Clusterable> extends
> ClusterEvaluator<T> {
> @Override
> public double score(List<? extends Cluster<T>> clusters) {
> //TODO: Implement the Calinski-Harabasz Score algorithm
> return 0;
> }
>
> @Override
> public boolean isBetterScore(double score1, double score2) {
> return score1 > score2;
> }
This method does not seem very useful.
> }
> ```
>
> The code can be implemented by read the algorithm documents,
> or translate from python sklearn.metrics.calinski_harabasz_score.
What's the license of that code?
Regards,
Gilles
[1]
https://commons.apache.org/proper/commons-rng/commons-rng-simple/javadocs/api-1.3/org/apache/commons/rng/simple/RandomSource.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]