Re: Contributing to MLlib: Proposal for Clustering Algorithms

Hector Yee Tue, 08 Jul 2014 13:02:33 -0700

I would say for bigdata applications the most useful would be hierarchical
k-means with back tracking and the ability to support k nearest centroids.



On Tue, Jul 8, 2014 at 10:54 AM, RJ Nowling <[email protected]> wrote:

> Hi all,
>
> MLlib currently has one clustering algorithm implementation, KMeans.
> It would benefit from having implementations of other clustering
> algorithms such as MiniBatch KMeans, Fuzzy C-Means, Hierarchical
> Clustering, and Affinity Propagation.
>
> I recently submitted a PR [1] for a MiniBatch KMeans implementation,
> and I saw an email on this list about interest in implementing Fuzzy
> C-Means.
>
> Based on Sean Owen's review of my MiniBatch KMeans code, it became
> apparent that before I implement more clustering algorithms, it would
> be useful to hammer out a framework to reduce code duplication and
> implement a consistent API.
>
> I'd like to gauge the interest and goals of the MLlib community:
>
> 1. Are you interested in having more clustering algorithms available?
>
> 2. Is the community interested in specifying a common framework?
>
> Thanks!
> RJ
>
> [1] - https://github.com/apache/spark/pull/1248
>
>
> --
> em [email protected]
> c 954.496.2314
>



-- 
Yee Yang Li Hector <http://google.com/+HectorYee>
*google.com/+HectorYee <http://google.com/+HectorYee>*

Re: Contributing to MLlib: Proposal for Clustering Algorithms

Reply via email to