adding Fazlan

On Fri, Jun 9, 2017 at 9:54 AM, Sachini Siriwardene <[email protected]>
wrote:

> Hi Fazlan,
> Please find my replies inline.
>
> On Wed, Jun 7, 2017 at 3:48 PM, Fazlan Nazeem <[email protected]> wrote:
>
>> Hi Malith,
>>
>>
>> On Wed, Jun 7, 2017 at 3:04 PM, Malith Jayasinghe <[email protected]>
>> wrote:
>>
>>> Hello All,
>>>
>>>
>>>
>>> We are developing k-means clustering extension. k-means is an
>>> unsupervised learning algorithm  which provides a simple way  to classify a
>>> given data set through a certain number of clusters . The standard k-means
>>> clustering algorithm is a nondeterministic algorithm. This means that we
>>> can get different results for the same input data when we run the algorithm
>>> multiple times. The reason is that the algorithm randomly chooses k
>>> observations from the data set and uses these as the initial means.  Here
>>> we implement a variant of k means in which the initial cluster centers
>>> are determined by the first k distinct values. This will ensure the same
>>> output for a given input.
>>>
>>>
>>>
>>> Function Parameters: Data point to be clustered
>>>
>>> Number of cluster centers - k
>>>
>>> Number of iterations - m
>>>
>>> Number of events for which the model is trained - x
>>>
>>>
>>>
>>> The cluster centers are initialized based on the first distinct number
>>> of k (number of cluster centers) events in the stream.
>>>
>>> The model is trained for every x events received.
>>>
>>
>> Does this mean at any point in time, the maximum number of input points
>> used by the training process is x? Also how is the training process carried
>> out? I assume the training doesn't happen in real time.
>>
>
>   Training is carried out on the number of data points accumulated,
> depending on the window used.  The data is collected over a given window
> size, by updating an array list.
>
> Once an event is expired from the window, an element is removed from the
> array list.
>
>
>
> For every x number of data points received, the data accumulated in the
> array list is sent to be clustered and new cluster centers are computed.
> The training is carried out real time, for the data available in the array
> list at the time it is sent for clustering.
>
> The training process includes:
>
>    1.
>
>    Initializing the cluster centers based on the distinct number of first
>    k data points in the data set. If distinct data points is less than the k
>    value, the number of cluster centers will be initialized to distinct number
>    of data points.
>    2.
>
>    The data points in the given data set is assigned to the available
>    cluster centers.
>    3.
>
>    The new cluster centers are computed for the assigned data for each
>    cluster center by taking the average value.
>    4.
>
>    The values in the data set are re assigned and cluster centers
>    recomputed until the cluster center values do not change or the number of
>    iterations is reached.
>
>
>
> An option can be given to train the model for only the first x number of
> events or train it for each x data points received.
>
>
>
>
>
>>
>> After receiving the first x events, an output is given for each event
>>> generated. The output consists of the cluster centre value to which the
>>> data point belongs, the id of the particular cluster center and the
>>> distance from the cluster center.
>>>
>>>
>>>
>>> The clustering can be performed for a given window implementation i.e.
>>> time, time batch, length
>>>
>>> --
>>> Malith Jayasinghe
>>>
>>> WSO2, Inc. (http://wso2.com)
>>> Email   :[email protected]
>>> Mobile :0770704040
>>> Blog     :https://medium.com/@malith.jayasinghe
>>> <https://medium.com/@malith.jayasinghe>
>>> Lean . Enterprise . Middleware
>>>
>>> _______________________________________________
>>> Architecture mailing list
>>> [email protected]
>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>>
>> Fazlan Nazeem
>>
>> *Senior Software Engineer*
>>
>> *WSO2 Inc*
>> Mobile : +94772338839
>> <%2B94%20%280%29%20773%20451194>
>> [email protected]
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>
>
> --
> Sachini Siriwardene
> Software Engineering Intern
>
> +94774274374 <+94%2077%20427%204374>
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>


-- 
Malith Jayasinghe

WSO2, Inc. (http://wso2.com)
Email   :[email protected]
Mobile :0770704040
Blog     :https://medium.com/@malith.jayasinghe
<https://medium.com/@malith.jayasinghe>
Lean . Enterprise . Middleware
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to