[
https://issues.apache.org/jira/browse/SPARK-21086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097062#comment-16097062
]
yuhao yang commented on SPARK-21086:
------------------------------------
sure, indices sounds fine.
For the driver memory, especially for CrossValidator, caching all the trained
models would be impractical and not necessary. Even though all the models are
collected to the driver, but it's a sequential process. And with the current
implementation of CrossValidator, GC can kick in and clear all the previous
models which is especially practical for large models.
> CrossValidator, TrainValidationSplit should preserve all models after fitting
> -----------------------------------------------------------------------------
>
> Key: SPARK-21086
> URL: https://issues.apache.org/jira/browse/SPARK-21086
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Joseph K. Bradley
>
> I've heard multiple requests for having CrossValidatorModel and
> TrainValidationSplitModel preserve the full list of fitted models. This
> sounds very valuable.
> One decision should be made before we do this: Should we save and load the
> models in ML persistence? That could blow up the size of a saved Pipeline if
> the models are large.
> * I suggest *not* saving the models by default but allowing saving if
> specified. We could specify whether to save the model as an extra Param for
> CrossValidatorModelWriter, but we would have to make sure to expose
> CrossValidatorModelWriter as a public API and modify the return type of
> CrossValidatorModel.write to be CrossValidatorModelWriter (but this will not
> be a breaking change).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]