[jira] [Created] (GEARPUMP-55) Add kmeans example

Kam Kasravi (JIRA) Wed, 20 Apr 2016 05:23:45 -0700

Kam Kasravi created GEARPUMP-55:
-----------------------------------

             Summary: Add kmeans example
                 Key: GEARPUMP-55
                 URL: https://issues.apache.org/jira/browse/GEARPUMP-55
             Project: Apache Gearpump
          Issue Type: New Feature
          Components: examples
    Affects Versions: 0.8.0
            Reporter: Kam Kasravi
            Priority: Minor
             Fix For: 0.8.1

>From [pangolulu|https://github.com/pangolulu]

There is a document about streaming kmeans in Spark
(https://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html),
I think we can try to implement it on Gearpump. Here is my processor topology
on Gearpump:
!https://cloud.githubusercontent.com/assets/5796671/14097520/93a2b498-f5a4-11e5-8df8-ef2b62c3b5ff.PNG!

The `Source Processor` will produce points by time, then broadcast the point to
the `Distribution Processor`. The number of tasks of the `Distribution
Processor` is k, where each task save one center and the corresponding points.
When `Distribution Processor` receives a point from `Source Processor`, it will
calculate the distance of this point to its center, and then send the distance
along with the point and its `taskId` to the `Collection Processor`. When
`Collection Processor` receives the distance from `Distribution Processor`, it
will accumulate the number of current points, determine if it's time to update
center, choose the smallest distance and then send the point along with its
corresponding `Distribution Processor` taskId by broadcast partitioner. When
`Distribution Processor` receives the result message, task with the
corresponding `taskId` will accumulate the point. If `Distribution Processor`
receives that it's time to update center, then all the tasks will update its
corresponding center.

This procedure is streaming and the center of cluster will change by time.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (GEARPUMP-55) Add kmeans example

Reply via email to