[ https://issues.apache.org/jira/browse/SPARK-19957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925723#comment-15925723 ]
Nick Pentreath commented on SPARK-19957: ---------------------------------------- See https://issues.apache.org/jira/browse/SPARK-16832 > Inconsist KMeans initialization mode behavior between ML and MLlib > ------------------------------------------------------------------ > > Key: SPARK-19957 > URL: https://issues.apache.org/jira/browse/SPARK-19957 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.0 > Reporter: yuhao yang > Priority: Minor > > when users set the initialization mode to "random", KMeans in ML and MLlib > has inconsistent behavior for multiple runs: > MLlib will basically use new Random for each run. > ML Kmeans however will use the default random seed, which is > {code}this.getClass.getName.hashCode.toLong{code}, and keep using the same > number among multiple fitting. > I would expect the "random" initialization mode to be literally random. > There're different solutions with different scope of impact. Adjusting the > hasSeed trait may have a broader impact(but maybe worth discussion). We can > always just set random default seed in KMeans. > Appreciate your feedback. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org