Hello All,


We are developing k-means clustering extension. k-means is an unsupervised
learning algorithm  which provides a simple way  to classify a given data
set through a certain number of clusters . The standard k-means clustering
algorithm is a nondeterministic algorithm. This means that we can get
different results for the same input data when we run the algorithm
multiple times. The reason is that the algorithm randomly chooses k
observations from the data set and uses these as the initial means.  Here
we implement a variant of k means in which the initial cluster centers are
determined by the first k distinct values. This will ensure the same output
for a given input.



Function Parameters: Data point to be clustered

Number of cluster centers - k

Number of iterations - m

Number of events for which the model is trained - x



The cluster centers are initialized based on the first distinct number of k
(number of cluster centers) events in the stream.

The model is trained for every x events received.

After receiving the first x events, an output is given for each event
generated. The output consists of the cluster centre value to which the
data point belongs, the id of the particular cluster center and the
distance from the cluster center.



The clustering can be performed for a given window implementation i.e.
time, time batch, length

-- 
Malith Jayasinghe

WSO2, Inc. (http://wso2.com)
Email   :[email protected]
Mobile :0770704040
Blog     :https://medium.com/@malith.jayasinghe
<https://medium.com/@malith.jayasinghe>
Lean . Enterprise . Middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to