[ https://issues.apache.org/jira/browse/SPARK-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694365#comment-14694365 ]
Joseph K. Bradley commented on SPARK-9918: ------------------------------------------ I like this idea. "runs" seems like it should be in a wrapper analogous to CrossValidator (choosing best model from multiple runs). And I agree we can probably get similar performance improvements by blocking the cluster centers and/or data points for higher-level BLAS ops. > Remove runs from KMeans under the pipeline API > ---------------------------------------------- > > Key: SPARK-9918 > URL: https://issues.apache.org/jira/browse/SPARK-9918 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.5.0 > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > > This requires some discussion. I'm not sure whether `runs` is a useful > parameter. It certainly complicates the implementation. We might want to > optimize the k-means implementation with block matrix operations. In this > case, having `runs` may not be worth the trade-offs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org