Hi.

Le jeu. 27 févr. 2020 à 06:17, chentao...@qq.com <chentao...@qq.com> a écrit :
>
> Hi,
>
> > [...]
> >> >>
> >> >> Do you mean I should fire a JIRA issue about reuse&nbsp;"centroidOf" 
> >> >> and "chooseInitialCenters",
> >> >> then start a PR and a disscuss about "ClusterUtils"?
> >> >> And then&nbsp;start the PR of "MiniBatchKMeansClusterer" after all done?
> >> >
> >> >I cannot guarantee that the whole process will be streamlined.
> >> >In effect, you can work on multiple branches (one for each
> >> >prospective PR).
> >> >I'd say that you should start by describing (here on the ML) the
> >> >rationale for "ClusterUtils" (and contrast it with say, a common
> >> >base class).
> >> >[Only when the design has been agreed on,  a JIRA issue to
> >> >implement it should be created in order to track the actual
> >> >coding work).]
> >>
> >> OK, I think we should start from here:
> >>
> >> The method "centroidOf"  and "chooseInitialCenters" in 
> >> KMeansPlusPlusClusterer
> >>  could be reused by other KMeans Clusterer like MiniBatchKMeansClusterer 
> >> which I want to implement.
> >>
> >> There are two solution for reuse "centroidOf"  and "chooseInitialCenters":
> >> 1. Extract a abstract class for KMeans Clusterer named 
> >> "AbstractKMeansClusterer",
> >>  and move "centroidOf"  and "chooseInitialCenters" as protected methods in 
> >> it;
> >>  the EmptyClusterStrategy and related logic can also move to the 
> >> "AbstractKMeansClusterer".
> >> 2. Create a static utility class, and move "centroidOf"  and 
> >> "chooseInitialCenters" in it,
> >>  and some useful clustering method like predict(Predict which cluster is 
> >> best for a specified point) can put in it.
> >>
> >
> >At first sight, I prefer option 1.
> >Indeed, o.a things "chooseInitialCenters" is a method that is of no interest 
> >to
> >users of the functionality (and so should not be part of the "public" API).
>
> Persuasive explain, and I agree with you, that extract a abstract class for 
> KMeans is better.
> And how can we make a conclusion?
> ---------------------------------------------
>
> Mention the "public API", I suppose there should be a series of 
> "CentroidInitializer",
>  that "chooseInitialCenters" with various of algorithms.
> The k-means++ cluster algorithm is a special implementation of k-means
>  which initialize cluster centers with k-means++ algorithm.
> So if there is a "CentroidInitializer", "KMeansPlusPlusClusterer" can be 
> "KMeansClusterer"
>  with a "KMeansPlusPlusCentroidInitializer" strategy.
> When "KMeansClusterer" initialize with a "RandomCentroidInitializer", it is a 
> common k-means.
>
> ----------------------------------------------------------
> >Method "centroidOf" looks generally useful.  Shouldn't it be part of
> >the "Cluster"
> >interface?  What is the difference with method "getCenter" (define by class
> >"CentroidCluster")?
>
> My understanding is,:
>  * "Cluster" is a data class that carry the result of a clustering,
> "getCenter" is just a get method of CentroidCluster for get the value of a 
> center point.
>  * "Cluster[er]" is a (Interface of )algorithm that classify points to sets 
> of Cluster.
>  * "CentroidCluster" is the result of a group of special Clusterer algorithm 
> like k-means,
>  "centroidOf" is a specific logic to calculate the center point for a 
> collection of points.
> [Instead the DBScan cluster algorithm dose not care about the "Centroid"]
>
> So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists yet),
>  but different with "CentroidCluster.getCenter".

I may be missing something about the existing design,
but it seems strange that "CentroidCluster" is initialized
with a given "center", yet it is possible to add points after
initialization (which IIUC would invalidate the "center").
It would seem that "center" should be a property computed
from the contents of "Cluster" e.g.:

@FunctionalInterface
public interface ClusterCenterComputer<T extends Clusterable> {
    T centroidOf(Cluster<T> cluster);
}

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to