[
https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279218#comment-16279218
]
Bago Amirbekian edited comment on SPARK-22126 at 12/5/17 9:53 PM:
------------------------------------------------------------------
I started a discussion about potential to this issue on this
[gist|https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0]. I'm
going to summarize the gist here here and encourage further discussion to take
place on this JIRA to increase visibility of the discussion.
I proposed that we add a new method to the `Estimator` interface
`fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this
method is to allow estimators to implement model specific optimizations for
fitting each model with multiple paramMaps. This API will also be use by
`CrossValidator` and other meta transformers when fitting multiple models in
parallel.
[~WeichenXu123] suggested modifying the API to `fitMultiple(dataset:
Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The
reasoning is that allowing each callable to return multiple models will make it
easier to efficiently schedule these tasks in parallel (eg we will avoid
scheduling A and B where B simply waits on A).
was (Author: bago.amirbekian):
I started a discussion about potential to this issue on this gist. I'm going to
summarize the gist here here and encourage further discussion to take place on
this JIRA to increase visibility of the discussion.
I proposed that we add a new method to the `Estimator` interface
`fitMultiple(dataset, paramMaps): Array[Callable[Model]]`. The purpose of this
method is to allow estimators to implement model specific optimizations for
fitting each model with multiple paramMaps. This API will also be use by
`CrossValidator` and other meta transformers when fitting multiple models in
parallel.
[~WeichenXu123] suggested modifying the API to `fitMultiple(dataset:
Dataset[_], paramMaps: Array[ParamMap]): Array[Callable[Map[Int, M]]]`. The
reasoning is that allowing each callable to return multiple models will make it
easier to efficiently schedule these tasks in parallel (eg we will avoid
scheduling A and B where B simply waits on A).
> Fix model-specific optimization support for ML tuning
> -----------------------------------------------------
>
> Key: SPARK-22126
> URL: https://issues.apache.org/jira/browse/SPARK-22126
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.3.0
> Reporter: Weichen Xu
>
> Fix model-specific optimization support for ML tuning. This is discussed in
> SPARK-19357
> more discussion is here
> https://gist.github.com/MrBago/f501b9e7712dc6a67dc9fea24e309bf0
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]