yuhao yang created SPARK-21535:
----------------------------------
Summary: Reduce memory requirement for CrossValidator and
TrainValidationSplit
Key: SPARK-21535
URL: https://issues.apache.org/jira/browse/SPARK-21535
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.0
Reporter: yuhao yang
CrossValidator and TrainValidationSplit both use
{code}models = est.fit(trainingDataset, epm) {code} to fit the models, where
epm is Array[ParamMap].
Even though the training process is sequential, current implementation consumes
extra driver memory for holding the trained models, which is not necessary and
often leads to memory exception for both CrossValidator and
TrainValidationSplit. My proposal is to changing the training implementation to
train one model at a time, thus that used local model can be collected by GC,
and avoid the unnecessary OOM exceptions.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]