Hi Sachini, Okay. I think I misread the "every x events" part previously. This means if x is 100 when 200 events have been received we would have 2 models in total. +1 if that is the case.
On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]> wrote: > adding Fazlan > > On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]> > wrote: > >> Hi Fazlan, >> Please find my replies inline. >> >> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote: >> >>> Hi Malith, >>> >>> >>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]> >>> wrote: >>> >>>> Hello All, >>>> >>>> >>>> >>>> We are developing k-means clustering extension. k-means is an >>>> unsupervised learning algorithm which provides a simple way to classify a >>>> given data set through a certain number of clusters . The standard k-means >>>> clustering algorithm is a nondeterministic algorithm. This means that we >>>> can get different results for the same input data when we run the algorithm >>>> multiple times. The reason is that the algorithm randomly chooses k >>>> observations from the data set and uses these as the initial means. Here >>>> we implement a variant of k means in which the initial cluster centers >>>> are determined by the first k distinct values. This will ensure the same >>>> output for a given input. >>>> >>>> >>>> >>>> Function Parameters: Data point to be clustered >>>> >>>> Number of cluster centers - k >>>> >>>> Number of iterations - m >>>> >>>> Number of events for which the model is trained - x >>>> >>>> >>>> >>>> The cluster centers are initialized based on the first distinct number >>>> of k (number of cluster centers) events in the stream. >>>> >>>> The model is trained for every x events received. >>>> >>> >>> Does this mean at any point in time, the maximum number of input points >>> used by the training process is x? Also how is the training process carried >>> out? I assume the training doesn't happen in real time. >>> >> >> Training is carried out on the number of data points accumulated, >> depending on the window used. The data is collected over a given window >> size, by updating an array list. >> >> Once an event is expired from the window, an element is removed from the >> array list. >> >> >> >> For every x number of data points received, the data accumulated in the >> array list is sent to be clustered and new cluster centers are computed. >> The training is carried out real time, for the data available in the array >> list at the time it is sent for clustering. >> >> The training process includes: >> >> 1. >> >> Initializing the cluster centers based on the distinct number of >> first k data points in the data set. If distinct data points is less than >> the k value, the number of cluster centers will be initialized to distinct >> number of data points. >> 2. >> >> The data points in the given data set is assigned to the available >> cluster centers. >> 3. >> >> The new cluster centers are computed for the assigned data for each >> cluster center by taking the average value. >> 4. >> >> The values in the data set are re assigned and cluster centers >> recomputed until the cluster center values do not change or the number of >> iterations is reached. >> >> >> >> An option can be given to train the model for only the first x number of >> events or train it for each x data points received. >> >> >> >> >> >>> >>> After receiving the first x events, an output is given for each event >>>> generated. The output consists of the cluster centre value to which the >>>> data point belongs, the id of the particular cluster center and the >>>> distance from the cluster center. >>>> >>>> >>>> >>>> The clustering can be performed for a given window implementation i.e. >>>> time, time batch, length >>>> >>>> -- >>>> Malith Jayasinghe >>>> >>>> WSO2, Inc. (http://wso2.com) >>>> Email :[email protected] >>>> Mobile :0770704040 >>>> Blog :https://medium.com/@malith.jayasinghe >>>> <https://medium.com/@malith.jayasinghe> >>>> Lean . Enterprise . Middleware >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> Thanks & Regards, >>> >>> Fazlan Nazeem >>> >>> *Senior Software Engineer* >>> >>> *WSO2 Inc* >>> Mobile : +94772338839 >>> <%2B94%20%280%29%20773%20451194> >>> [email protected] >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Sachini Siriwardene >> Software Engineering Intern >> >> +94774274374 <+94%2077%20427%204374> >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Malith Jayasinghe > > WSO2, Inc. (http://wso2.com) > Email :[email protected] > Mobile :0770704040 > Blog :https://medium.com/@malith.jayasinghe > <https://medium.com/@malith.jayasinghe> > Lean . Enterprise . Middleware > -- Thanks & Regards, Fazlan Nazeem *Senior Software Engineer* *WSO2 Inc* Mobile : +94772338839 <%2B94%20%280%29%20773%20451194> [email protected]
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
