Hi Sachini,

Okay. I think I misread the "every x events" part previously. This means if
x is 100 when 200 events have been received we would have 2 models in
total. +1 if that is the case.


On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]> wrote:

> adding Fazlan
>
> On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]>
> wrote:
>
>> Hi Fazlan,
>> Please find my replies inline.
>>
>> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote:
>>
>>> Hi Malith,
>>>
>>>
>>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]>
>>> wrote:
>>>
>>>> Hello All,
>>>>
>>>>
>>>>
>>>> We are developing k-means clustering extension. k-means is an
>>>> unsupervised learning algorithm  which provides a simple way  to classify a
>>>> given data set through a certain number of clusters . The standard k-means
>>>> clustering algorithm is a nondeterministic algorithm. This means that we
>>>> can get different results for the same input data when we run the algorithm
>>>> multiple times. The reason is that the algorithm randomly chooses k
>>>> observations from the data set and uses these as the initial means.  Here
>>>> we implement a variant of k means in which the initial cluster centers
>>>> are determined by the first k distinct values. This will ensure the same
>>>> output for a given input.
>>>>
>>>>
>>>>
>>>> Function Parameters: Data point to be clustered
>>>>
>>>> Number of cluster centers - k
>>>>
>>>> Number of iterations - m
>>>>
>>>> Number of events for which the model is trained - x
>>>>
>>>>
>>>>
>>>> The cluster centers are initialized based on the first distinct number
>>>> of k (number of cluster centers) events in the stream.
>>>>
>>>> The model is trained for every x events received.
>>>>
>>>
>>> Does this mean at any point in time, the maximum number of input points
>>> used by the training process is x? Also how is the training process carried
>>> out? I assume the training doesn't happen in real time.
>>>
>>
>>   Training is carried out on the number of data points accumulated,
>> depending on the window used.  The data is collected over a given window
>> size, by updating an array list.
>>
>> Once an event is expired from the window, an element is removed from the
>> array list.
>>
>>
>>
>> For every x number of data points received, the data accumulated in the
>> array list is sent to be clustered and new cluster centers are computed.
>> The training is carried out real time, for the data available in the array
>> list at the time it is sent for clustering.
>>
>> The training process includes:
>>
>>    1.
>>
>>    Initializing the cluster centers based on the distinct number of
>>    first k data points in the data set. If distinct data points is less than
>>    the k value, the number of cluster centers will be initialized to distinct
>>    number of data points.
>>    2.
>>
>>    The data points in the given data set is assigned to the available
>>    cluster centers.
>>    3.
>>
>>    The new cluster centers are computed for the assigned data for each
>>    cluster center by taking the average value.
>>    4.
>>
>>    The values in the data set are re assigned and cluster centers
>>    recomputed until the cluster center values do not change or the number of
>>    iterations is reached.
>>
>>
>>
>> An option can be given to train the model for only the first x number of
>> events or train it for each x data points received.
>>
>>
>>
>>
>>
>>>
>>> After receiving the first x events, an output is given for each event
>>>> generated. The output consists of the cluster centre value to which the
>>>> data point belongs, the id of the particular cluster center and the
>>>> distance from the cluster center.
>>>>
>>>>
>>>>
>>>> The clustering can be performed for a given window implementation i.e.
>>>> time, time batch, length
>>>>
>>>> --
>>>> Malith Jayasinghe
>>>>
>>>> WSO2, Inc. (http://wso2.com)
>>>> Email   :[email protected]
>>>> Mobile :0770704040
>>>> Blog     :https://medium.com/@malith.jayasinghe
>>>> <https://medium.com/@malith.jayasinghe>
>>>> Lean . Enterprise . Middleware
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> Fazlan Nazeem
>>>
>>> *Senior Software Engineer*
>>>
>>> *WSO2 Inc*
>>> Mobile : +94772338839
>>> <%2B94%20%280%29%20773%20451194>
>>> [email protected]
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Sachini Siriwardene
>> Software Engineering Intern
>>
>> +94774274374 <+94%2077%20427%204374>
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Malith Jayasinghe
>
> WSO2, Inc. (http://wso2.com)
> Email   :[email protected]
> Mobile :0770704040
> Blog     :https://medium.com/@malith.jayasinghe
> <https://medium.com/@malith.jayasinghe>
> Lean . Enterprise . Middleware
>



-- 
Thanks & Regards,

Fazlan Nazeem

*Senior Software Engineer*

*WSO2 Inc*
Mobile : +94772338839
<%2B94%20%280%29%20773%20451194>
[email protected]
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to