[ 
https://issues.apache.org/jira/browse/SPARK-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-2694.
-------------------------------

    Resolution: Incomplete

> machine learning
> ----------------
>
>                 Key: SPARK-2694
>                 URL: https://issues.apache.org/jira/browse/SPARK-2694
>             Project: Spark
>          Issue Type: Documentation
>          Components: MLlib
>    Affects Versions: 1.0.0
>         Environment: Linux
>            Reporter: Akash
>              Labels: Algorithm
>             Fix For: 1.0.0
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Machine Learning Algorithm 
> Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm 
> proceeds by alternating between two steps:
>     Assignment step: Assign each observation to the cluster whose mean yields 
> the least within-cluster sum of squares . Since the sum of squares is the 
> squared Euclidean distance, this is intuitively the "nearest" mean. 
> (Mathematically, this means partitioning the observations according to the 
> Voronoi diagram generated by the means). 
> Update step: Calculate the new means to be the centroids of the observations 
> in the new clusters. 
>         Since the arithmetic mean is a least-squares estimator, this also 
> minimizes the within-cluster sum of squares objective.
> The algorithm has converged when the assignments no longer change. Since both 
> steps optimize the within-cluster sum of squares objective, and there only 
> exists a finite number of such partitionings, the algorithm must converge to 
> a (local) optimum.
> The algorithm is used for assigning objects to the nearest cluster by 
> distance. The standard algorithm aims at minimizing the WCSS objective, and 
> thus assigns by "least sum of squares", which is exactly equivalent to 
> assigning by the smallest Euclidean distance. Using a different distance 
> function other than (squared) Euclidean distance may stop the algorithm from 
> converging.[citation needed] Various modifications of k-means such as 
> spherical k-means and k-medoids have been proposed to allow using other 
> distance measures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to