[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279218#comment-16279218 ]
Bago Amirbekian commented on SPARK-22126: ----------------------------------------- I started a discussion about potential to this issue on this gist. I'm going to summarize the gist here here and encourage further discussion to take place on this JIRA to increase visibility of the discussion. I proposed that we add a new method to the `Estimator` interface `fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this method is to allow estimators to implement model specific optimizations for fitting each model with multiple paramMaps. This API will also be use by `CrossValidator` and other meta transformers when fitting multiple models in parallel. [~WeichenXu123] suggested modifying the API to `fitMultiple(dataset: Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The reasoning is that allowing each callable to return multiple models will make it easier to efficiently schedule these tasks in parallel (eg we will avoid scheduling A and B where B simply waits on A). > Fix model-specific optimization support for ML tuning > ----------------------------------------------------- > > Key: SPARK-22126 > URL: https://issues.apache.org/jira/browse/SPARK-22126 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.3.0 > Reporter: Weichen Xu > > Fix model-specific optimization support for ML tuning. This is discussed in > SPARK-19357 > more discussion is here > https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org