[ 
https://issues.apache.org/jira/browse/SPARK-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134339#comment-14134339
 ] 

Derrick Burns commented on SPARK-3219:
--------------------------------------

The key abstractions that need to be added to the K-Means implementation to 
support interesting distance functions are: Point (P), Center (C), and 
Centroid.  Then, one
can implementation a distance function Trait (called PointOps below) in a way 
that allows the implementer to pre-compute values for Point and Center, such as 
is hard-coded for the fast squared Euclidean distance function in the 1.0.2 
K-Means implementation.  Since the representation of Point and Center is 
abstracted, the implementer of the trait can use JBlas, Breeze, or whatever 
math library is preferred, again, without touching the generic K-Means 
implementation.

  trait PointOps[P <: FP[T], C <: FP[T], T] {
    def distance(p: P, c: C, upperBound: Distance): Distance

    def userToPoint(v: Array[Double], index: Option[T]): P

    def centerToPoint(v: C): P

    def pointToCenter(v: P): C

    def centroidToCenter(v: Centroid): C

    def centroidToPoint(v: Centroid): P

    def centerMoved(v: P, w: C): Boolean

  }

> K-Means clusterer should support Bregman distance functions
> -----------------------------------------------------------
>
>                 Key: SPARK-3219
>                 URL: https://issues.apache.org/jira/browse/SPARK-3219
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Derrick Burns
>            Assignee: Derrick Burns
>
> The K-Means clusterer supports the Euclidean distance metric.  However, it is 
> rather straightforward to support Bregman 
> (http://machinelearning.wustl.edu/mlpapers/paper_files/BanerjeeMDG05.pdf) 
> distance functions which would increase the utility of the clusterer 
> tremendously.
> I have modified the clusterer to support pluggable distance functions.  
> However, I notice that there are hundreds of outstanding pull requests.  If 
> someone is willing to work with me to sponsor the work through the process, I 
> will create a pull request.  Otherwise, I will just keep my own fork.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to