[ 
https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279218#comment-16279218
 ] 

Bago Amirbekian commented on SPARK-22126:
-----------------------------------------

I started a discussion about potential to this issue on this gist. I'm going to 
summarize the gist here here and encourage further discussion to take place on 
this JIRA to increase visibility of the discussion.

I proposed that we add a new method to the `Estimator` interface 
`fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this 
method is to allow estimators to implement model specific optimizations for 
fitting each model with multiple paramMaps. This API will also be use by 
`CrossValidator` and other meta transformers when fitting multiple models in 
parallel.

[~WeichenXu123] suggested modifying the API to `fitMultiple(dataset: 
Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The 
reasoning is that allowing each callable to return multiple models will make it 
easier to efficiently schedule these tasks in parallel (eg we will avoid 
scheduling A and B where B simply waits on A).

> Fix model-specific optimization support for ML tuning
> -----------------------------------------------------
>
>                 Key: SPARK-22126
>                 URL: https://issues.apache.org/jira/browse/SPARK-22126
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Weichen Xu
>
> Fix model-specific optimization support for ML tuning. This is discussed in 
> SPARK-19357
> more discussion is here
>  https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to