[ 
https://issues.apache.org/jira/browse/SPARK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342178#comment-14342178
 ] 

Sean Owen commented on SPARK-2430:
----------------------------------

There are a number of outstanding JIRAs about different approaches to 
clustering. This JIRA mentions emulating the scikit APIs to some extent. This 
is what the new Pipelines API kind of does. Looking over these JIRAs, how many 
are still relevant, and, can they be consolidated and updated to address the 
new API? That is, I doubt much is going to happen in the existing MLlib API.

> Standarized Clustering Algorithm API and Framework
> --------------------------------------------------
>
>                 Key: SPARK-2430
>                 URL: https://issues.apache.org/jira/browse/SPARK-2430
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: RJ Nowling
>            Priority: Minor
>
> Recently, there has been a chorus of voices on the mailing lists about adding 
> new clustering algorithms to MLlib.  To support these additions, we should 
> develop a common framework and API to reduce code duplication and keep the 
> APIs consistent.
> At the same time, we can also expand the current API to incorporate requested 
> features such as arbitrary distance metrics or pre-computed distance matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to