Hello All,
We are developing k-means clustering extension. k-means is an unsupervised learning algorithm which provides a simple way to classify a given data set through a certain number of clusters . The standard k-means clustering algorithm is a nondeterministic algorithm. This means that we can get different results for the same input data when we run the algorithm multiple times. The reason is that the algorithm randomly chooses k observations from the data set and uses these as the initial means. Here we implement a variant of k means in which the initial cluster centers are determined by the first k distinct values. This will ensure the same output for a given input. Function Parameters: Data point to be clustered Number of cluster centers - k Number of iterations - m Number of events for which the model is trained - x The cluster centers are initialized based on the first distinct number of k (number of cluster centers) events in the stream. The model is trained for every x events received. After receiving the first x events, an output is given for each event generated. The output consists of the cluster centre value to which the data point belongs, the id of the particular cluster center and the distance from the cluster center. The clustering can be performed for a given window implementation i.e. time, time batch, length -- Malith Jayasinghe WSO2, Inc. (http://wso2.com) Email :[email protected] Mobile :0770704040 Blog :https://medium.com/@malith.jayasinghe <https://medium.com/@malith.jayasinghe> Lean . Enterprise . Middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
