[ https://issues.apache.org/jira/browse/SPARK-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342178#comment-14342178 ]
Sean Owen commented on SPARK-2430: ---------------------------------- There are a number of outstanding JIRAs about different approaches to clustering. This JIRA mentions emulating the scikit APIs to some extent. This is what the new Pipelines API kind of does. Looking over these JIRAs, how many are still relevant, and, can they be consolidated and updated to address the new API? That is, I doubt much is going to happen in the existing MLlib API. > Standarized Clustering Algorithm API and Framework > -------------------------------------------------- > > Key: SPARK-2430 > URL: https://issues.apache.org/jira/browse/SPARK-2430 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: RJ Nowling > Priority: Minor > > Recently, there has been a chorus of voices on the mailing lists about adding > new clustering algorithms to MLlib. To support these additions, we should > develop a common framework and API to reduce code duplication and keep the > APIs consistent. > At the same time, we can also expand the current API to incorporate requested > features such as arbitrary distance metrics or pre-computed distance matrices. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org