[ 
https://issues.apache.org/jira/browse/GEARPUMP-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang closed GEARPUMP-55.
------------------------------
    Resolution: Duplicate

> Add kmeans example
> ------------------
>
>                 Key: GEARPUMP-55
>                 URL: https://issues.apache.org/jira/browse/GEARPUMP-55
>             Project: Apache Gearpump
>          Issue Type: New Feature
>          Components: examples
>    Affects Versions: 0.8.0
>            Reporter: Kam Kasravi
>            Priority: Minor
>             Fix For: 0.8.1
>
>
> From [pangolulu|https://github.com/pangolulu]
> There is a document about streaming kmeans in Spark 
> (https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html),
>  I think we can try to implement it on Gearpump. Here is my processor 
> topology on Gearpump:
> !https://cloud.githubusercontent.com/assets/5796671/14097520/93a2b498-f5a4-11e5-8df8-ef2b62c3b5ff.PNG!
> The `Source Processor` will produce points by time, then broadcast the point 
> to the `Distribution Processor`. The number of tasks of the `Distribution 
> Processor` is k, where each task save one center and the corresponding 
> points. When `Distribution Processor` receives a point from `Source 
> Processor`, it will calculate the distance of this point to its center, and 
> then send the distance along with the point and its `taskId` to the 
> `Collection Processor`. When `Collection Processor` receives the distance 
> from `Distribution Processor`, it will accumulate the number of current 
> points, determine if it's time to update center, choose the smallest distance 
> and then send the point along with its corresponding `Distribution Processor` 
> taskId by broadcast partitioner. When `Distribution Processor` receives the 
> result message, task with the corresponding `taskId` will accumulate the 
> point. If `Distribution Processor` receives that it's time to update center, 
> then all the tasks will update its corresponding center.
> This procedure is streaming and the center of cluster will change by time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to