Hi Fazlan, Please find my replies inline. On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote:
> Hi Malith, > > > On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]> > wrote: > >> Hello All, >> >> >> >> We are developing k-means clustering extension. k-means is an >> unsupervised learning algorithm which provides a simple way to classify a >> given data set through a certain number of clusters . The standard k-means >> clustering algorithm is a nondeterministic algorithm. This means that we >> can get different results for the same input data when we run the algorithm >> multiple times. The reason is that the algorithm randomly chooses k >> observations from the data set and uses these as the initial means. Here >> we implement a variant of k means in which the initial cluster centers >> are determined by the first k distinct values. This will ensure the same >> output for a given input. >> >> >> >> Function Parameters: Data point to be clustered >> >> Number of cluster centers - k >> >> Number of iterations - m >> >> Number of events for which the model is trained - x >> >> >> >> The cluster centers are initialized based on the first distinct number of >> k (number of cluster centers) events in the stream. >> >> The model is trained for every x events received. >> > > Does this mean at any point in time, the maximum number of input points > used by the training process is x? Also how is the training process carried > out? I assume the training doesn't happen in real time. > Training is carried out on the number of data points accumulated, depending on the window used. The data is collected over a given window size, by updating an array list. Once an event is expired from the window, an element is removed from the array list. For every x number of data points received, the data accumulated in the array list is sent to be clustered and new cluster centers are computed. The training is carried out real time, for the data available in the array list at the time it is sent for clustering. The training process includes: 1. Initializing the cluster centers based on the distinct number of first k data points in the data set. If distinct data points is less than the k value, the number of cluster centers will be initialized to distinct number of data points. 2. The data points in the given data set is assigned to the available cluster centers. 3. The new cluster centers are computed for the assigned data for each cluster center by taking the average value. 4. The values in the data set are re assigned and cluster centers recomputed until the cluster center values do not change or the number of iterations is reached. An option can be given to train the model for only the first x number of events or train it for each x data points received. > > After receiving the first x events, an output is given for each event >> generated. The output consists of the cluster centre value to which the >> data point belongs, the id of the particular cluster center and the >> distance from the cluster center. >> >> >> >> The clustering can be performed for a given window implementation i.e. >> time, time batch, length >> >> -- >> Malith Jayasinghe >> >> WSO2, Inc. (http://wso2.com) >> Email :[email protected] >> Mobile :0770704040 >> Blog :https://medium.com/@malith.jayasinghe >> <https://medium.com/@malith.jayasinghe> >> Lean . Enterprise . Middleware >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Senior Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > [email protected] > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Sachini Siriwardene Software Engineering Intern +94774274374
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
