[ https://issues.apache.org/jira/browse/SPARK-10785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14990948#comment-14990948 ]
holdenk commented on SPARK-10785: --------------------------------- So looking at the tree work it looks like just did a grouByKey for each column index - which isn't very useful if we've only got a single column (although its quite possible I miss read some of that). I can do something useful though with just a single column (just does a sort on the RDD and uses the sorted RDD for the quantiles) if that sounds like what we are looking for? > Scale QuantileDiscretizer using distributed binning > --------------------------------------------------- > > Key: SPARK-10785 > URL: https://issues.apache.org/jira/browse/SPARK-10785 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: Joseph K. Bradley > > [SPARK-10064] improves binning in decision trees by distributing the > computation. QuantileDiscretizer should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org