Re: [Architecture] Siddhi: K-means Clustering extension

Sachini Siriwardene Thu, 08 Jun 2017 22:35:49 -0700

Hi Fazlan,
Yes , that is what happens.

On Fri, Jun 9, 2017 at 10:52 AM, Fazlan Nazeem <[email protected]> wrote:


> Hi Sachini,
>
> Okay. I think I misread the "every x events" part previously. This means
> if x is 100 when 200 events have been received we would have 2 models in
> total. +1 if that is the case.
>
>
> On Fri, Jun 9, 2017 at 9:56 AM, Malith Jayasinghe <[email protected]>
> wrote:
>
>> adding Fazlan
>>
>> On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]>
>> wrote:
>>
>>> Hi Fazlan,
>>> Please find my replies inline.
>>>
>>> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote:
>>>
>>>> Hi Malith,
>>>>
>>>>
>>>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>
>>>>>
>>>>> We are developing k-means clustering extension. k-means is an
>>>>> unsupervised learning algorithm  which provides a simple way  to classify 
>>>>> a
>>>>> given data set through a certain number of clusters . The standard k-means
>>>>> clustering algorithm is a nondeterministic algorithm. This means that we
>>>>> can get different results for the same input data when we run the 
>>>>> algorithm
>>>>> multiple times. The reason is that the algorithm randomly chooses k
>>>>> observations from the data set and uses these as the initial means.  Here
>>>>> we implement a variant of k means in which the initial cluster
>>>>> centers are determined by the first k distinct values. This will ensure 
>>>>> the
>>>>> same output for a given input.
>>>>>
>>>>>
>>>>>
>>>>> Function Parameters: Data point to be clustered
>>>>>
>>>>> Number of cluster centers - k
>>>>>
>>>>> Number of iterations - m
>>>>>
>>>>> Number of events for which the model is trained - x
>>>>>
>>>>>
>>>>>
>>>>> The cluster centers are initialized based on the first distinct number
>>>>> of k (number of cluster centers) events in the stream.
>>>>>
>>>>> The model is trained for every x events received.
>>>>>
>>>>
>>>> Does this mean at any point in time, the maximum number of input points
>>>> used by the training process is x? Also how is the training process carried
>>>> out? I assume the training doesn't happen in real time.
>>>>
>>>
>>>   Training is carried out on the number of data points accumulated,
>>> depending on the window used.  The data is collected over a given window
>>> size, by updating an array list.
>>>
>>> Once an event is expired from the window, an element is removed from the
>>> array list.
>>>
>>>
>>>
>>> For every x number of data points received, the data accumulated in the
>>> array list is sent to be clustered and new cluster centers are computed.
>>> The training is carried out real time, for the data available in the array
>>> list at the time it is sent for clustering.
>>>
>>> The training process includes:
>>>
>>>    1.
>>>
>>>    Initializing the cluster centers based on the distinct number of
>>>    first k data points in the data set. If distinct data points is less than
>>>    the k value, the number of cluster centers will be initialized to 
>>> distinct
>>>    number of data points.
>>>    2.
>>>
>>>    The data points in the given data set is assigned to the available
>>>    cluster centers.
>>>    3.
>>>
>>>    The new cluster centers are computed for the assigned data for each
>>>    cluster center by taking the average value.
>>>    4.
>>>
>>>    The values in the data set are re assigned and cluster centers
>>>    recomputed until the cluster center values do not change or the number of
>>>    iterations is reached.
>>>
>>>
>>>
>>> An option can be given to train the model for only the first x number of
>>> events or train it for each x data points received.
>>>
>>>
>>>
>>>
>>>
>>>>
>>>> After receiving the first x events, an output is given for each event
>>>>> generated. The output consists of the cluster centre value to which the
>>>>> data point belongs, the id of the particular cluster center and the
>>>>> distance from the cluster center.
>>>>>
>>>>>
>>>>>
>>>>> The clustering can be performed for a given window implementation i.e.
>>>>> time, time batch, length
>>>>>
>>>>> --
>>>>> Malith Jayasinghe
>>>>>
>>>>> WSO2, Inc. (http://wso2.com)
>>>>> Email   :[email protected]
>>>>> Mobile :0770704040
>>>>> Blog     :https://medium.com/@malith.jayasinghe
>>>>> <https://medium.com/@malith.jayasinghe>
>>>>> Lean . Enterprise . Middleware
>>>>>
>>>>> _______________________________________________
>>>>> Architecture mailing list
>>>>> [email protected]
>>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>>
>>>> Fazlan Nazeem
>>>>
>>>> *Senior Software Engineer*
>>>>
>>>> *WSO2 Inc*
>>>> Mobile : +94772338839
>>>> <%2B94%20%280%29%20773%20451194>
>>>> [email protected]
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> [email protected]
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> Sachini Siriwardene
>>> Software Engineering Intern
>>>
>>> +94774274374 <+94%2077%20427%204374>
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Malith Jayasinghe
>>
>> WSO2, Inc. (http://wso2.com)
>> Email   :[email protected]
>> Mobile :0770704040
>> Blog     :https://medium.com/@malith.jayasinghe
>> <https://medium.com/@malith.jayasinghe>
>> Lean . Enterprise . Middleware
>>
>
>
>
> --
> Thanks & Regards,
>
> Fazlan Nazeem
>
> *Senior Software Engineer*
>
> *WSO2 Inc*
> Mobile : +94772338839
> <%2B94%20%280%29%20773%20451194>
> [email protected]
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Sachini Siriwardene
Software Engineering Intern

+94774274374

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Siddhi: K-means Clustering extension

Reply via email to