Hi,

>Hello.
>
>>>> [...]
>>>> >>     For machine learning centroid cluster algorithm, we often use is
>>>> >> Calinsk-iHarabasz score to evaluate which algorithm or how many
>>>> >> centers is
>>>> >> best for a dataset.
>>>> >>     The python lib sklearn implements Calinsk-iHarabasz as
>>>> >> sklearn.metrics.calinski_harabasz_score.
>>>> >
>>>> >Could you post a reference (most of our documentation points
>>>> >to "Wikipedia" or "MathWorld")?
>>>>
>>>> "Calinsk-iHarabasz" is the most popular evaluator for Centriod Clusters
>>>> as I know.
>>>> I just read the code of sklearn, and think it easy to implement.
>>>> https://scikit-learn.org/stable/modules/generated/sklearn.metrics.calinski_harabasz_score.html
>>>> https://www.tandfonline.com/doi/abs/10.1080/03610927408827101
>>>
>>>Thanks; the original reference is quite fine too.
>>>
>>>> >
>>>> >> I think there should be a CalinskiHarabaszClusterEvaluator in commons
>>>> >> math:
>>>> >
>>>> >At first sight, the approach would be to define a functional
>>>> >interface (with the "score" method).
>>>> >Then an "enum" that would be a factory of evaluators, along
>>>> >the lines of what has been done in "Commons RNG" (see class
>>>> >"RandomSource"[1]).
>>>>
>>>> I just inherit the design of "ClusterEvaluator",
>>>> and I think change the design of exists API is another question.
>>>
>>>Not really: IMHO we should not pile feature on top of an
>>>API that might have shortcomings.  In particular, the fact
>>>that the new calls' constructor calls the parent's constructor
>>>with "null" looks problematic to me.
>>
>> IMHO "ClusterEvaluator" should be a interface(not just a functional
>> interface) as I described below.
>
>How about renaming it something more explicit and
>enforce unambiguous semantics for the ordering?  E.g.
>
>@FunctionalInterface
>public interface ClusterRanking<T extends Clusterable> {
>    /**
>     * Computes the rank (higher is better).
>     *
>     * @param clusters Clusters to be evaluated.
>     * @return the rank of the provided {@code clusters}.
>     */
>    double compute(List<? extends Cluster<T>> clusters);
>}
>[...]
>>>
>>>I've now seen how this used by "MultiKMeansPlusPlusClusterer".
>>>However, I wonder why the "Multi" feature is only available for that
>>>implementation...
>>>
>>
>> SumOfClusterVariances is just one of the ClusterEvaluator algorithm.
>> IMHO it is necessary to tell the user which score is better for each
>> ClusterEvaluator,
>
>Not if we enforce semantics (cf. above).
>
>> but smaller is better cannot be the default implementation
>> of ClusterEvaluator.isBetterScore,
>> and the name"isBetterScore" may be ambigous(Is the second score better than
>> the first?)
>
>The convention is set by the documentation.  However,
>the current API could be made simpler with the above
>proposal.
>
>>
>> The ClusterEvaluator can be a interface:
>> Solution 1: Compatible to old API:
>> ```java
>> public interface ClusterEvaluator<T extends Clusterable>{
>>     double score(Collection<? extends Cluster<T>> clusters);
>>     // Keep old API for compatible
>>     boolean isBetterScore(double score1, double score2);
>> }
>> ```
>> Solution 2: Use a explicit function name
>> ```java
>> public interface ClusterEvaluator<T extends Clusterable>{
>>     double score(Collection<? extends Cluster<T>> clusters);
>>     // Use a explicit name
>>     boolean isScoreImproved(double originScore, double newScore);
>> }
>> ```
>
>Solution 3  is "ClusterRanking".
>In cases where the reference algorithm would assume the
>other convention (i.e. "lower is better"), the implementation
>is required to apply a conversion (e.g. return the opposite).
>
If ClusterRanking is a functional interface,
I do not get how to smooth switch between "SumOfClusterVariances" and 
"CalinskiHarabasz" or other ClusterEvaluator.
although "SumOfClusterVariances" is the default evaluator in 
"MultiKMeansPlusPlusClusterer"
(May be because "SumOfClusterVariances" is only implementation now),
I think "CalinskiHarabasz" will be a better replacement.
I can only imagine the "score" and "isBetterScore" in one interface can switch 
smoothly.

> [...]
>"master".
>You could probably "rebase" your branch on it; try
>  $ git rebase master
>

Thanks for your guidance, I will follow them next time.

>>>> [...]
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to