[
https://issues.apache.org/jira/browse/SPARK-19357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179435#comment-16179435
]
Joseph K. Bradley commented on SPARK-19357:
-------------------------------------------
For memory problems, I agree that either (a) we'd have to do something clever
to handle a sequence of models, such as making each model materialized lazily
(if we're returning a Seq) or using callbacks, or (b) we'd have to accept
tradeoffs such as using N times more memory (or spend time dumping models to
disk to avoid memory costs) to get N times the speedup.
For duplicated implementations of parallelization, it's a good point that we'd
have to push the parallelization down into Estimators. The best option I have
thought of is:
* We could have a default implementation of fit() which implements parallelism.
Any Estimator without model-specific optimizations for multi-model training
could use this default.
* Estimators with model-specific optimizations could call into a shared
implementation of parallel fitting, with some implementation overhead from
needing to group subsets of ParamMaps to pass to it.
> Parallel Model Evaluation for ML Tuning: Scala
> ----------------------------------------------
>
> Key: SPARK-19357
> URL: https://issues.apache.org/jira/browse/SPARK-19357
> Project: Spark
> Issue Type: Sub-task
> Components: ML
> Reporter: Bryan Cutler
> Assignee: Bryan Cutler
> Fix For: 2.3.0
>
> Attachments: parallelism-verification-test.pdf
>
>
> This is a first step of the parent task of Optimizations for ML Pipeline
> Tuning to perform model evaluation in parallel. A simple approach is to
> naively evaluate with a possible parameter to control the level of
> parallelism. There are some concerns with this:
> * excessive caching of datasets
> * what to set as the default value for level of parallelism. 1 will evaluate
> all models in serial, as is done currently. Higher values could lead to
> excessive caching.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]