[
https://issues.apache.org/jira/browse/SPARK-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiangrui Meng updated SPARK-8402:
---------------------------------
Shepherd: Xiangrui Meng
> DP means clustering
> --------------------
>
> Key: SPARK-8402
> URL: https://issues.apache.org/jira/browse/SPARK-8402
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Meethu Mathew
> Assignee: Meethu Mathew
> Labels: features
>
> At present, all the clustering algorithms in MLlib require the number of
> clusters to be specified in advance.
> The Dirichlet process (DP) is a popular non-parametric Bayesian mixture model
> that allows for flexible clustering of data without having to specify apriori
> the number of clusters.
> DP means is a non-parametric clustering algorithm that uses a scale parameter
> 'lambda' to control the creation of new clusters["Revisiting k-means: New
> Algorithms via Bayesian Nonparametrics" by Brian Kulis, Michael I. Jordan].
> We have followed the distributed implementation of DP means which has been
> proposed in the paper titled "MLbase: Distributed Machine Learning Made Easy"
> by Xinghao Pan, Evan R. Sparks, Andre Wibisono.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]