[
https://issues.apache.org/jira/browse/SPARK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342178#comment-14342178
]
Sean Owen commented on SPARK-2430:
----------------------------------
There are a number of outstanding JIRAs about different approaches to
clustering. This JIRA mentions emulating the scikit APIs to some extent. This
is what the new Pipelines API kind of does. Looking over these JIRAs, how many
are still relevant, and, can they be consolidated and updated to address the
new API? That is, I doubt much is going to happen in the existing MLlib API.
> Standarized Clustering Algorithm API and Framework
> --------------------------------------------------
>
> Key: SPARK-2430
> URL: https://issues.apache.org/jira/browse/SPARK-2430
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: RJ Nowling
> Priority: Minor
>
> Recently, there has been a chorus of voices on the mailing lists about adding
> new clustering algorithms to MLlib. To support these additions, we should
> develop a common framework and API to reduce code duplication and keep the
> APIs consistent.
> At the same time, we can also expand the current API to incorporate requested
> features such as arbitrary distance metrics or pre-computed distance matrices.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]