[ https://issues.apache.org/jira/browse/SPARK-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen resolved SPARK-2694. ------------------------------- Resolution: Incomplete > machine learning > ---------------- > > Key: SPARK-2694 > URL: https://issues.apache.org/jira/browse/SPARK-2694 > Project: Spark > Issue Type: Documentation > Components: MLlib > Affects Versions: 1.0.0 > Environment: Linux > Reporter: Akash > Labels: Algorithm > Fix For: 1.0.0 > > Original Estimate: 0h > Remaining Estimate: 0h > > Machine Learning Algorithm > Given an initial set of k means m1(1),…,mk(1) (see below), the algorithm > proceeds by alternating between two steps: > Assignment step: Assign each observation to the cluster whose mean yields > the least within-cluster sum of squares . Since the sum of squares is the > squared Euclidean distance, this is intuitively the "nearest" mean. > (Mathematically, this means partitioning the observations according to the > Voronoi diagram generated by the means). > Update step: Calculate the new means to be the centroids of the observations > in the new clusters. > Since the arithmetic mean is a least-squares estimator, this also > minimizes the within-cluster sum of squares objective. > The algorithm has converged when the assignments no longer change. Since both > steps optimize the within-cluster sum of squares objective, and there only > exists a finite number of such partitionings, the algorithm must converge to > a (local) optimum. > The algorithm is used for assigning objects to the nearest cluster by > distance. The standard algorithm aims at minimizing the WCSS objective, and > thus assigns by "least sum of squares", which is exactly equivalent to > assigning by the smallest Euclidean distance. Using a different distance > function other than (squared) Euclidean distance may stop the algorithm from > converging.[citation needed] Various modifications of k-means such as > spherical k-means and k-medoids have been proposed to allow using other > distance measures. -- This message was sent by Atlassian JIRA (v6.2#6252)