Should we add an option to enable/disable continuous learning? If "on" then training will happen after every x events otherwise only after first x events.
On Fri, Jun 9, 2017 at 11:04 AM, Sachini Siriwardene <[email protected]> wrote: > Hi Fazlan, > Yes , that is what happens. > > On Fri, Jun 9, 2017 at 10:52 AM, Fazlan Nazeem <[email protected]> wrote: > >> Hi Sachini, >> >> Okay. I think I misread the "every x events" part previously. This means >> if x is 100 when 200 events have been received we would have 2 models in >> total. +1 if that is the case. >> >> >> On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]> >> wrote: >> >>> adding Fazlan >>> >>> On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]> >>> wrote: >>> >>>> Hi Fazlan, >>>> Please find my replies inline. >>>> >>>> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote: >>>> >>>>> Hi Malith, >>>>> >>>>> >>>>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello All, >>>>>> >>>>>> >>>>>> >>>>>> We are developing k-means clustering extension. k-means is an >>>>>> unsupervised learning algorithm which provides a simple way to >>>>>> classify a >>>>>> given data set through a certain number of clusters . The standard >>>>>> k-means >>>>>> clustering algorithm is a nondeterministic algorithm. This means that we >>>>>> can get different results for the same input data when we run the >>>>>> algorithm >>>>>> multiple times. The reason is that the algorithm randomly chooses k >>>>>> observations from the data set and uses these as the initial means. Here >>>>>> we implement a variant of k means in which the initial cluster >>>>>> centers are determined by the first k distinct values. This will ensure >>>>>> the >>>>>> same output for a given input. >>>>>> >>>>>> >>>>>> >>>>>> Function Parameters: Data point to be clustered >>>>>> >>>>>> Number of cluster centers - k >>>>>> >>>>>> Number of iterations - m >>>>>> >>>>>> Number of events for which the model is trained - x >>>>>> >>>>>> >>>>>> >>>>>> The cluster centers are initialized based on the first distinct >>>>>> number of k (number of cluster centers) events in the stream. >>>>>> >>>>>> The model is trained for every x events received. >>>>>> >>>>> >>>>> Does this mean at any point in time, the maximum number of input >>>>> points used by the training process is x? Also how is the training process >>>>> carried out? I assume the training doesn't happen in real time. >>>>> >>>> >>>> Training is carried out on the number of data points accumulated, >>>> depending on the window used. The data is collected over a given window >>>> size, by updating an array list. >>>> >>>> Once an event is expired from the window, an element is removed from >>>> the array list. >>>> >>>> >>>> >>>> For every x number of data points received, the data accumulated in the >>>> array list is sent to be clustered and new cluster centers are computed. >>>> The training is carried out real time, for the data available in the array >>>> list at the time it is sent for clustering. >>>> >>>> The training process includes: >>>> >>>> 1. >>>> >>>> Initializing the cluster centers based on the distinct number of >>>> first k data points in the data set. If distinct data points is less >>>> than >>>> the k value, the number of cluster centers will be initialized to >>>> distinct >>>> number of data points. >>>> 2. >>>> >>>> The data points in the given data set is assigned to the available >>>> cluster centers. >>>> 3. >>>> >>>> The new cluster centers are computed for the assigned data for each >>>> cluster center by taking the average value. >>>> 4. >>>> >>>> The values in the data set are re assigned and cluster centers >>>> recomputed until the cluster center values do not change or the number >>>> of >>>> iterations is reached. >>>> >>>> >>>> >>>> An option can be given to train the model for only the first x number >>>> of events or train it for each x data points received. >>>> >>>> >>>> >>>> >>>> >>>>> >>>>> After receiving the first x events, an output is given for each event >>>>>> generated. The output consists of the cluster centre value to which the >>>>>> data point belongs, the id of the particular cluster center and the >>>>>> distance from the cluster center. >>>>>> >>>>>> >>>>>> >>>>>> The clustering can be performed for a given window implementation >>>>>> i.e. time, time batch, length >>>>>> >>>>>> -- >>>>>> Malith Jayasinghe >>>>>> >>>>>> WSO2, Inc. (http://wso2.com) >>>>>> Email :[email protected] >>>>>> Mobile :0770704040 >>>>>> Blog :https://medium.com/@malith.jayasinghe >>>>>> <https://medium.com/@malith.jayasinghe> >>>>>> Lean . Enterprise . Middleware >>>>>> >>>>>> _______________________________________________ >>>>>> Architecture mailing list >>>>>> [email protected] >>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks & Regards, >>>>> >>>>> Fazlan Nazeem >>>>> >>>>> *Senior Software Engineer* >>>>> >>>>> *WSO2 Inc* >>>>> Mobile : +94772338839 >>>>> <%2B94%20%280%29%20773%20451194> >>>>> [email protected] >>>>> >>>>> _______________________________________________ >>>>> Architecture mailing list >>>>> [email protected] >>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>>> >>>>> >>>> >>>> >>>> -- >>>> Sachini Siriwardene >>>> Software Engineering Intern >>>> >>>> +94774274374 <+94%2077%20427%204374> >>>> >>>> _______________________________________________ >>>> Architecture mailing list >>>> [email protected] >>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >>>> >>>> >>> >>> >>> -- >>> Malith Jayasinghe >>> >>> WSO2, Inc. (http://wso2.com) >>> Email :[email protected] >>> Mobile :0770704040 >>> Blog :https://medium.com/@malith.jayasinghe >>> <https://medium.com/@malith.jayasinghe> >>> Lean . Enterprise . Middleware >>> >> >> >> >> -- >> Thanks & Regards, >> >> Fazlan Nazeem >> >> *Senior Software Engineer* >> >> *WSO2 Inc* >> Mobile : +94772338839 >> <%2B94%20%280%29%20773%20451194> >> [email protected] >> >> _______________________________________________ >> Architecture mailing list >> [email protected] >> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture >> >> > > > -- > Sachini Siriwardene > Software Engineering Intern > > +94774274374 <+94%2077%20427%204374> > -- Malith Jayasinghe WSO2, Inc. (http://wso2.com) Email :[email protected] Mobile :0770704040 Blog :https://medium.com/@malith.jayasinghe <https://medium.com/@malith.jayasinghe> Lean . Enterprise . Middleware
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
