[
https://issues.apache.org/jira/browse/SPARK-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-2694.
-------------------------------
Resolution: Incomplete
> machine learning
> ----------------
>
> Key: SPARK-2694
> URL: https://issues.apache.org/jira/browse/SPARK-2694
> Project: Spark
> Issue Type: Documentation
> Components: MLlib
> Affects Versions: 1.0.0
> Environment: Linux
> Reporter: Akash
> Labels: Algorithm
> Fix For: 1.0.0
>
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> Machine Learning Algorithm
> Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm
> proceeds by alternating between two steps:
> Assignment step: Assign each observation to the cluster whose mean yields
> the least within-cluster sum of squares . Since the sum of squares is the
> squared Euclidean distance, this is intuitively the "nearest" mean.
> (Mathematically, this means partitioning the observations according to the
> Voronoi diagram generated by the means).
> Update step: Calculate the new means to be the centroids of the observations
> in the new clusters.
> Since the arithmetic mean is a least-squares estimator, this also
> minimizes the within-cluster sum of squares objective.
> The algorithm has converged when the assignments no longer change. Since both
> steps optimize the within-cluster sum of squares objective, and there only
> exists a finite number of such partitionings, the algorithm must converge to
> a (local) optimum.
> The algorithm is used for assigning objects to the nearest cluster by
> distance. The standard algorithm aims at minimizing the WCSS objective, and
> thus assigns by "least sum of squares", which is exactly equivalent to
> assigning by the smallest Euclidean distance. Using a different distance
> function other than (squared) Euclidean distance may stop the algorithm from
> converging.[citation needed] Various modifications of k-means such as
> spherical k-means and k-medoids have been proposed to allow using other
> distance measures.
--
This message was sent by Atlassian JIRA
(v6.2#6252)