[
https://issues.apache.org/jira/browse/SPARK-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng closed SPARK-4039.
--------------------------------
Resolution: Duplicate
> KMeans support sparse cluster centers
> -------------------------------------
>
> Key: SPARK-4039
> URL: https://issues.apache.org/jira/browse/SPARK-4039
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.1.0
> Reporter: Antoine Amend
>
> When the number of features is not known, it might be quite helpful to create
> sparse vectors using HashingTF.transform. KMeans transforms centers vectors
> to dense vectors
> (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L307),
> therefore leading to OutOfMemory (even with small k).
> Any way to keep vectors sparse ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]