Should we add an option to enable/disable continuous learning? If "on" then
training will happen after every x events otherwise only after first x
events.

On Fri, Jun 9, 2017 at 11:04 AM, Sachini Siriwardene <[email protected]>
wrote:

> Hi Fazlan,
> Yes , that is what happens.
>
> On Fri, Jun 9, 2017 at 10:52 AM, Fazlan Nazeem <[email protected]> wrote:
>
>> Hi Sachini,
>>
>> Okay. I think I misread the "every x events" part previously. This means
>> if x is 100 when 200 events have been received we would have 2 models in
>> total. +1 if that is the case.
>>
>>
>> On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]>
>> wrote:
>>
>>> adding Fazlan
>>>
>>> On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]>
>>> wrote:
>>>
>>>> Hi Fazlan,
>>>> Please find my replies inline.
>>>>
>>>> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote:
>>>>
>>>>> Hi Malith,
>>>>>
>>>>>
>>>>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are developing k-means clustering extension. k-means is an
>>>>>> unsupervised learning algorithm  which provides a simple way  to 
>>>>>> classify a
>>>>>> given data set through a certain number of clusters . The standard 
>>>>>> k-means
>>>>>> clustering algorithm is a nondeterministic algorithm. This means that we
>>>>>> can get different results for the same input data when we run the 
>>>>>> algorithm
>>>>>> multiple times. The reason is that the algorithm randomly chooses k
>>>>>> observations from the data set and uses these as the initial means.  Here
>>>>>> we implement a variant of k means in which the initial cluster
>>>>>> centers are determined by the first k distinct values. This will ensure 
>>>>>> the
>>>>>> same output for a given input.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Function Parameters: Data point to be clustered
>>>>>>
>>>>>> Number of cluster centers - k
>>>>>>
>>>>>> Number of iterations - m
>>>>>>
>>>>>> Number of events for which the model is trained - x
>>>>>>
>>>>>>
>>>>>>
>>>>>> The cluster centers are initialized based on the first distinct
>>>>>> number of k (number of cluster centers) events in the stream.
>>>>>>
>>>>>> The model is trained for every x events received.
>>>>>>
>>>>>
>>>>> Does this mean at any point in time, the maximum number of input
>>>>> points used by the training process is x? Also how is the training process
>>>>> carried out? I assume the training doesn't happen in real time.
>>>>>
>>>>
>>>>   Training is carried out on the number of data points accumulated,
>>>> depending on the window used.  The data is collected over a given window
>>>> size, by updating an array list.
>>>>
>>>> Once an event is expired from the window, an element is removed from
>>>> the array list.
>>>>
>>>>
>>>>
>>>> For every x number of data points received, the data accumulated in the
>>>> array list is sent to be clustered and new cluster centers are computed.
>>>> The training is carried out real time, for the data available in the array
>>>> list at the time it is sent for clustering.
>>>>
>>>> The training process includes:
>>>>
>>>>    1.
>>>>
>>>>    Initializing the cluster centers based on the distinct number of
>>>>    first k data points in the data set. If distinct data points is less 
>>>> than
>>>>    the k value, the number of cluster centers will be initialized to 
>>>> distinct
>>>>    number of data points.
>>>>    2.
>>>>
>>>>    The data points in the given data set is assigned to the available
>>>>    cluster centers.
>>>>    3.
>>>>
>>>>    The new cluster centers are computed for the assigned data for each
>>>>    cluster center by taking the average value.
>>>>    4.
>>>>
>>>>    The values in the data set are re assigned and cluster centers
>>>>    recomputed until the cluster center values do not change or the number 
>>>> of
>>>>    iterations is reached.
>>>>
>>>>
>>>>
>>>> An option can be given to train the model for only the first x number
>>>> of events or train it for each x data points received.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> After receiving the first x events, an output is given for each event
>>>>>> generated. The output consists of the cluster centre value to which the
>>>>>> data point belongs, the id of the particular cluster center and the
>>>>>> distance from the cluster center.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The clustering can be performed for a given window implementation
>>>>>> i.e. time, time batch, length
>>>>>>
>>>>>> --
>>>>>> Malith Jayasinghe
>>>>>>
>>>>>> WSO2, Inc. (http://wso2.com)
>>>>>> Email   :[email protected]
>>>>>> Mobile :0770704040
>>>>>> Blog     :https://medium.com/@malith.jayasinghe
>>>>>> <https://medium.com/@malith.jayasinghe>
>>>>>> Lean . Enterprise . Middleware
>>>>>>
>>>>>> _______________________________________________
>>>>>> Architecture mailing list
>>>>>> [email protected]
>>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks & Regards,
>>>>>
>>>>> Fazlan Nazeem
>>>>>
>>>>> *Senior Software Engineer*
>>>>>
>>>>> *WSO2 Inc*
>>>>> Mobile : +94772338839
>>>>> <%2B94%20%280%29%20773%20451194>
>>>>> [email protected]
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Sachini Siriwardene
>>>> Software Engineering Intern
>>>>
>>>> +94774274374 <+94%2077%20427%204374>
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Malith Jayasinghe
>>>
>>> WSO2, Inc. (http://wso2.com)
>>> Email   :[email protected]
>>> Mobile :0770704040
>>> Blog     :https://medium.com/@malith.jayasinghe
>>> <https://medium.com/@malith.jayasinghe>
>>> Lean . Enterprise . Middleware
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Fazlan Nazeem
>>
>> *Senior Software Engineer*
>>
>> *WSO2 Inc*
>> Mobile : +94772338839
>> <%2B94%20%280%29%20773%20451194>
>> [email protected]
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Sachini Siriwardene
> Software Engineering Intern
>
> +94774274374 <+94%2077%20427%204374>
>



-- 
Malith Jayasinghe

WSO2, Inc. (http://wso2.com)
Email   :[email protected]
Mobile :0770704040
Blog     :https://medium.com/@malith.jayasinghe
<https://medium.com/@malith.jayasinghe>
Lean . Enterprise . Middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to