Hi. Le jeu. 27 févr. 2020 à 06:17, chentao...@qq.com <chentao...@qq.com> a écrit : > > Hi, > > > [...] > >> >> > >> >> Do you mean I should fire a JIRA issue about reuse "centroidOf" > >> >> and "chooseInitialCenters", > >> >> then start a PR and a disscuss about "ClusterUtils"? > >> >> And then start the PR of "MiniBatchKMeansClusterer" after all done? > >> > > >> >I cannot guarantee that the whole process will be streamlined. > >> >In effect, you can work on multiple branches (one for each > >> >prospective PR). > >> >I'd say that you should start by describing (here on the ML) the > >> >rationale for "ClusterUtils" (and contrast it with say, a common > >> >base class). > >> >[Only when the design has been agreed on, a JIRA issue to > >> >implement it should be created in order to track the actual > >> >coding work).] > >> > >> OK, I think we should start from here: > >> > >> The method "centroidOf" and "chooseInitialCenters" in > >> KMeansPlusPlusClusterer > >> could be reused by other KMeans Clusterer like MiniBatchKMeansClusterer > >> which I want to implement. > >> > >> There are two solution for reuse "centroidOf" and "chooseInitialCenters": > >> 1. Extract a abstract class for KMeans Clusterer named > >> "AbstractKMeansClusterer", > >> and move "centroidOf" and "chooseInitialCenters" as protected methods in > >> it; > >> the EmptyClusterStrategy and related logic can also move to the > >> "AbstractKMeansClusterer". > >> 2. Create a static utility class, and move "centroidOf" and > >> "chooseInitialCenters" in it, > >> and some useful clustering method like predict(Predict which cluster is > >> best for a specified point) can put in it. > >> > > > >At first sight, I prefer option 1. > >Indeed, o.a things "chooseInitialCenters" is a method that is of no interest > >to > >users of the functionality (and so should not be part of the "public" API). > > Persuasive explain, and I agree with you, that extract a abstract class for > KMeans is better. > And how can we make a conclusion? > --------------------------------------------- > > Mention the "public API", I suppose there should be a series of > "CentroidInitializer", > that "chooseInitialCenters" with various of algorithms. > The k-means++ cluster algorithm is a special implementation of k-means > which initialize cluster centers with k-means++ algorithm. > So if there is a "CentroidInitializer", "KMeansPlusPlusClusterer" can be > "KMeansClusterer" > with a "KMeansPlusPlusCentroidInitializer" strategy. > When "KMeansClusterer" initialize with a "RandomCentroidInitializer", it is a > common k-means. > > ---------------------------------------------------------- > >Method "centroidOf" looks generally useful. Shouldn't it be part of > >the "Cluster" > >interface? What is the difference with method "getCenter" (define by class > >"CentroidCluster")? > > My understanding is,: > * "Cluster" is a data class that carry the result of a clustering, > "getCenter" is just a get method of CentroidCluster for get the value of a > center point. > * "Cluster[er]" is a (Interface of )algorithm that classify points to sets > of Cluster. > * "CentroidCluster" is the result of a group of special Clusterer algorithm > like k-means, > "centroidOf" is a specific logic to calculate the center point for a > collection of points. > [Instead the DBScan cluster algorithm dose not care about the "Centroid"] > > So, "centroidOf" may be a method of "CentroidCluster[er]"(not exists yet), > but different with "CentroidCluster.getCenter".
I may be missing something about the existing design, but it seems strange that "CentroidCluster" is initialized with a given "center", yet it is possible to add points after initialization (which IIUC would invalidate the "center"). It would seem that "center" should be a property computed from the contents of "Cluster" e.g.: @FunctionalInterface public interface ClusterCenterComputer<T extends Clusterable> { T centroidOf(Cluster<T> cluster); } Regards, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org