Hi, >Hello. > >>>> [...] >>>> >> For machine learning centroid cluster algorithm, we often use is >>>> >> Calinsk-iHarabasz score to evaluate which algorithm or how many >>>> >> centers is >>>> >> best for a dataset. >>>> >> The python lib sklearn implements Calinsk-iHarabasz as >>>> >> sklearn.metrics.calinski_harabasz_score. >>>> > >>>> >Could you post a reference (most of our documentation points >>>> >to "Wikipedia" or "MathWorld")? >>>> >>>> "Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters >>>> as I know. >>>> I just read the code of sklearn, and think it easy to implement. >>>> https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html >>>> https://www.tandfonline.com/doi/abs/10.1080/03610927408827101 >>> >>>Thanks; the original reference is quite fine too. >>> >>>> > >>>> >> I think there should be a CalinskiHarabaszClusterEvaluator in commons >>>> >> math: >>>> > >>>> >At first sight, the approach would be to define a functional >>>> >interface (with the "score" method). >>>> >Then an "enum" that would be a factory of evaluators, along >>>> >the lines of what has been done in "Commons RNG" (see class >>>> >"RandomSource"[1]). >>>> >>>> I just inherit the design of "ClusterEvaluator", >>>> and I think change the design of exists API is another question. >>> >>>Not really: IMHO we should not pile feature on top of an >>>API that might have shortcomings. In particular, the fact >>>that the new calls' constructor calls the parent's constructor >>>with "null" looks problematic to me. >> >> IMHO "ClusterEvaluator" should be a interface(not just a functional >> interface) as I described below. > >How about renaming it something more explicit and >enforce unambiguous semantics for the ordering? E.g. > >@FunctionalInterface >public interface ClusterRanking<T extends Clusterable> { > /** > * Computes the rank (higher is better). > * > * @param clusters Clusters to be evaluated. > * @return the rank of the provided {@code clusters}. > */ > double compute(List<? extends Cluster<T>> clusters); >} >[...] >>> >>>I've now seen how this used by "MultiKMeansPlusPlusClusterer". >>>However, I wonder why the "Multi" feature is only available for that >>>implementation... >>> >> >> SumOfClusterVariances is just one of the ClusterEvaluator algorithm. >> IMHO it is necessary to tell the user which score is better for each >> ClusterEvaluator, > >Not if we enforce semantics (cf. above). > >> but smaller is better cannot be the default implementation >> of ClusterEvaluator.isBetterScore, >> and the name"isBetterScore" may be ambigous(Is the second score better than >> the first?) > >The convention is set by the documentation. However, >the current API could be made simpler with the above >proposal. > >> >> The ClusterEvaluator can be a interface: >> Solution 1: Compatible to old API: >> ```java >> public interface ClusterEvaluator<T extends Clusterable>{ >> double score(Collection<? extends Cluster<T>> clusters); >> // Keep old API for compatible >> boolean isBetterScore(double score1, double score2); >> } >> ``` >> Solution 2: Use a explicit function name >> ```java >> public interface ClusterEvaluator<T extends Clusterable>{ >> double score(Collection<? extends Cluster<T>> clusters); >> // Use a explicit name >> boolean isScoreImproved(double originScore, double newScore); >> } >> ``` > >Solution 3 is "ClusterRanking". >In cases where the reference algorithm would assume the >other convention (i.e. "lower is better"), the implementation >is required to apply a conversion (e.g. return the opposite). > If ClusterRanking is a functional interface, I do not get how to smooth switch between "SumOfClusterVariances" and "CalinskiHarabasz" or other ClusterEvaluator. although "SumOfClusterVariances" is the default evaluator in "MultiKMeansPlusPlusClusterer" (May be because "SumOfClusterVariances" is only implementation now), I think "CalinskiHarabasz" will be a better replacement. I can only imagine the "score" and "isBetterScore" in one interface can switch smoothly.
> [...] >"master". >You could probably "rebase" your branch on it; try > $ git rebase master > Thanks for your guidance, I will follow them next time. >>>> [...] --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org