Hi Fazlan, Yes , that is what happens. On Fri, Jun 9, 2017 at 10:52 AM, Fazlan Nazeem <[email protected]> wrote:
> Hi Sachini, > > Okay. I think I misread the "every x events" part previously. This means > if x is 100 when 200 events have been received we would have 2 models in > total. +1 if that is the case. > > > On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]> > wrote: > >> adding Fazlan >> >> On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]> >> wrote: >> >>> Hi Fazlan, >>> Please find my replies inline. >>> >>> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote: >>> >>>> Hi Malith, >>>> >>>> >>>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]> >>>> wrote: >>>> >>>>> Hello All, >>>>> >>>>> >>>>> >>>>> We are developing k-means clustering extension. k-means is an >>>>> unsupervised learning algorithm which provides a simple way to classify >>>>> a >>>>> given data set through a certain number of clusters . The standard k-means >>>>> clustering algorithm is a nondeterministic algorithm. This means that we >>>>> can get different results for the same input data when we run the >>>>> algorithm >>>>> multiple times. The reason is that the algorithm randomly chooses k >>>>> observations from the data set and uses these as the initial means. Here >>>>> we implement a variant of k means in which the initial cluster >>>>> centers are determined by the first k distinct values. This will ensure >>>>> the >>>>> same output for a given input. >>>>> >>>>> >>>>> >>>>> Function Parameters: Data point to be clustered >>>>> >>>>> Number of cluster centers - k >>>>> >>>>> Number of iterations - m >>>>> >>>>> Number of events for which the model is trained - x >>>>> >>>>> >>>>> >>>>> The cluster centers are initialized based on the first distinct number >>>>> of k (number of cluster centers) events in the stream. >>>>> >>>>> The model is trained for every x events received. >>>>> >>>> >>>> Does this mean at any point in time, the maximum number of input points >>>> used by the training process is x? Also how is the training process carried >>>> out? I assume the training doesn't happen in real time. >>>> >>> >>> Training is carried out on the number of data points accumulated, >>> depending on the window used. The data is collected over a given window >>> size, by updating an array list. >>> >>> Once an event is expired from the window, an element is removed from the >>> array list. >>> >>> >>> >>> For every x number of data points received, the data accumulated in the >>> array list is sent to be clustered and new cluster centers are computed. >>> The training is carried out real time, for the data available in the array >>> list at the time it is sent for clustering. >>> >>> The training process includes: >>> >>> 1. >>> >>> Initializing the cluster centers based on the distinct number of >>> first k data points in the data set. If distinct data points is less than >>> the k value, the number of cluster centers will be initialized to >>> distinct >>> number of data points. >>> 2. >>> >>> The data points in the given data set is assigned to the available >>> cluster centers. >>> 3. >>> >>> The new cluster centers are computed for the assigned data for each >>> cluster center by taking the average value. >>> 4. >>> >>> The values in the data set are re assigned and cluster centers >>> recomputed until the cluster center values do not change or the number of >>> iterations is reached. >>> >>> >>> >>> An option can be given to train the model for only the first x number of >>> events or train it for each x data points received. >>> >>> >>> >>> >>> >>>> >>>> After receiving the first x events, an output is given for each event >>>>> generated. The output consists of the cluster centre value to which the >>>>> data point belongs, the id of the particular cluster center and the >>>>> distance from the cluster center. >>>>> >>>>> >>>>> >>>>> The clustering can be performed for a given window implementation i.e. >>>>> time, time batch, length >>>>> >>>>> -- >>>>> Malith Jayasinghe >>>>> >>>>> WSO2, Inc. (http://wso2.com) >>>>> Email :[email protected] >>>>> Mobile :0770704040 >>>>> Blog :https://medium.com/@malith.jayasinghe >>>>> <https://medium.com/@malith.jayasinghe> >>>>> Lean . Enterprise . Middleware >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> >>>> Fazlan Nazeem >>>> >>>> *Senior Software Engineer* >>>> >>>> *WSO2 Inc* >>>> Mobile : +94772338839 >>>> <%2B94%20%280%29%20773%20451194> >>>> [email protected] >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> Sachini Siriwardene >>> Software Engineering Intern >>> >>> +94774274374 <+94%2077%20427%204374> >>> >>> _______________________________________________ >>> Architecture mailing list >>> [email protected] >>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>> >>> >> >> >> -- >> Malith Jayasinghe >> >> WSO2, Inc. (http://wso2.com) >> Email :[email protected] >> Mobile :0770704040 >> Blog :https://medium.com/@malith.jayasinghe >> <https://medium.com/@malith.jayasinghe> >> Lean . Enterprise . Middleware >> > > > > -- > Thanks & Regards, > > Fazlan Nazeem > > *Senior Software Engineer* > > *WSO2 Inc* > Mobile : +94772338839 > <%2B94%20%280%29%20773%20451194> > [email protected] > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > > -- Sachini Siriwardene Software Engineering Intern +94774274374
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
