[ 
https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279218#comment-16279218
 ] 

Bago Amirbekian edited comment on SPARK-22126 at 12/5/17 9:53 PM:
------------------------------------------------------------------

I started a discussion about potential to this issue on this 
[gist|https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0]. I'm 
going to summarize the gist here here and encourage further discussion to take 
place on this JIRA to increase visibility of the discussion.


I proposed that we add a new method to the `Estimator` interface 
`fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this 
method is to allow estimators to implement model specific optimizations for 
fitting each model with multiple paramMaps. This API will also be use by 
`CrossValidator` and other meta transformers when fitting multiple models in 
parallel.

[~WeichenXu123] suggested modifying the API to `fitMultiple(dataset: 
Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The 
reasoning is that allowing each callable to return multiple models will make it 
easier to efficiently schedule these tasks in parallel (eg we will avoid 
scheduling A and B where B simply waits on A).


was (Author: bago.amirbekian):
I started a discussion about potential to this issue on this gist. I'm going to 
summarize the gist here here and encourage further discussion to take place on 
this JIRA to increase visibility of the discussion.

I proposed that we add a new method to the `Estimator` interface 
`fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this 
method is to allow estimators to implement model specific optimizations for 
fitting each model with multiple paramMaps. This API will also be use by 
`CrossValidator` and other meta transformers when fitting multiple models in 
parallel.

[~WeichenXu123] suggested modifying the API to `fitMultiple(dataset: 
Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The 
reasoning is that allowing each callable to return multiple models will make it 
easier to efficiently schedule these tasks in parallel (eg we will avoid 
scheduling A and B where B simply waits on A).

> Fix model-specific optimization support for ML tuning
> -----------------------------------------------------
>
>                 Key: SPARK-22126
>                 URL: https://issues.apache.org/jira/browse/SPARK-22126
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Weichen Xu
>
> Fix model-specific optimization support for ML tuning. This is discussed in 
> SPARK-19357
> more discussion is here
>  https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to