[ https://issues.apache.org/jira/browse/SPARK-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882123#comment-15882123 ]
Nick Pentreath commented on SPARK-14084: ---------------------------------------- I guess we could have put SPARK-19071 into this ticket (sorry about that) - but since SPARK-19071 also covers a longer-term plan for further optimizing parallel CV, I'm going to close this as Superceded By. If watchers are still interested, please watch SPARK-19071. Thanks! > Parallel training jobs in model selection > ----------------------------------------- > > Key: SPARK-14084 > URL: https://issues.apache.org/jira/browse/SPARK-14084 > Project: Spark > Issue Type: New Feature > Components: ML > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > > In CrossValidator and TrainValidationSplit, we run training jobs one by one. > If users have a big cluster, they might see speed-ups if we parallelize the > job submission on the driver. The trade-off is that we might need to make > multiple copies of the training data, which could be expensive. It is worth > testing and figure out the best way to implement it. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org