adding Fazlan On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]> wrote:
> Hi Fazlan, > Please find my replies inline. > > On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote: > >> Hi Malith, >> >> >> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]> >> wrote: >> >>> Hello All, >>> >>> >>> >>> We are developing k-means clustering extension. k-means is an >>> unsupervised learning algorithm which provides a simple way to classify a >>> given data set through a certain number of clusters . The standard k-means >>> clustering algorithm is a nondeterministic algorithm. This means that we >>> can get different results for the same input data when we run the algorithm >>> multiple times. The reason is that the algorithm randomly chooses k >>> observations from the data set and uses these as the initial means. Here >>> we implement a variant of k means in which the initial cluster centers >>> are determined by the first k distinct values. This will ensure the same >>> output for a given input. >>> >>> >>> >>> Function Parameters: Data point to be clustered >>> >>> Number of cluster centers - k >>> >>> Number of iterations - m >>> >>> Number of events for which the model is trained - x >>> >>> >>> >>> The cluster centers are initialized based on the first distinct number >>> of k (number of cluster centers) events in the stream. >>> >>> The model is trained for every x events received. >>> >> >> Does this mean at any point in time, the maximum number of input points >> used by the training process is x? Also how is the training process carried >> out? I assume the training doesn't happen in real time. >> > > Training is carried out on the number of data points accumulated, > depending on the window used. The data is collected over a given window > size, by updating an array list. > > Once an event is expired from the window, an element is removed from the > array list. > > > > For every x number of data points received, the data accumulated in the > array list is sent to be clustered and new cluster centers are computed. > The training is carried out real time, for the data available in the array > list at the time it is sent for clustering. > > The training process includes: > > 1. > > Initializing the cluster centers based on the distinct number of first > k data points in the data set. If distinct data points is less than the k > value, the number of cluster centers will be initialized to distinct number > of data points. > 2. > > The data points in the given data set is assigned to the available > cluster centers. > 3. > > The new cluster centers are computed for the assigned data for each > cluster center by taking the average value. > 4. > > The values in the data set are re assigned and cluster centers > recomputed until the cluster center values do not change or the number of > iterations is reached. > > > > An option can be given to train the model for only the first x number of > events or train it for each x data points received. > > > > > >> >> After receiving the first x events, an output is given for each event >>> generated. The output consists of the cluster centre value to which the >>> data point belongs, the id of the particular cluster center and the >>> distance from the cluster center. >>> >>> >>> >>> The clustering can be performed for a given window implementation i.e. >>> time, time batch, length >>> >>> -- >>> Malith Jayasinghe >>> >>> WSO2, Inc. (http://wso2.com) >>> Email :[email protected] >>> Mobile :0770704040 >>> Blog :https://medium.com/@malith.jayasinghe >>> <https://medium.com/@malith.jayasinghe> >>> Lean . Enterprise . Middleware >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Thanks & Regards, >> >> Fazlan Nazeem >> >> *Senior Software Engineer* >> >> *WSO2 Inc* >> Mobile : +94772338839 >> <%2B94%20%280%29%20773%20451194> >> [email protected] >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Sachini Siriwardene > Software Engineering Intern > > +94774274374 <+94%2077%20427%204374> > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Malith Jayasinghe WSO2, Inc. (http://wso2.com) Email :[email protected] Mobile :0770704040 Blog :https://medium.com/@malith.jayasinghe <https://medium.com/@malith.jayasinghe> Lean . Enterprise . Middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
