[
https://issues.apache.org/jira/browse/SPARK-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14615502#comment-14615502
]
Joseph K. Bradley commented on SPARK-6001:
------------------------------------------
[~yalamart] This should probably be done under the Pipelines API, via the
R-like stats design linked above. I'd recommend we wait to include this until
the initial (LinearRegression) PR for R-like stats is merged, after which this
JIRA can follow that design as an example.
> K-Means clusterer should return the assignments of input points to clusters
> ---------------------------------------------------------------------------
>
> Key: SPARK-6001
> URL: https://issues.apache.org/jira/browse/SPARK-6001
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.2.1
> Reporter: Derrick Burns
> Priority: Minor
>
> The K-Means clusterer returns a KMeansModel that contains the cluster
> centers. However, when available, I suggest that the K-Means clusterer also
> return an RDD of the assignments of the input data to the clusters. While the
> assignments can be computed given the KMeansModel, why not return assignments
> if they are available to save re-computation costs.
> The K-means implementation at
> https://github.com/derrickburns/generalized-kmeans-clustering returns the
> assignments when available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]