[ https://issues.apache.org/jira/browse/SPARK-22707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279588#comment-16279588 ]
Apache Spark commented on SPARK-22707: -------------------------------------- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/19904 > Optimize Crossvalidator fitting memory occupation by models > ----------------------------------------------------------- > > Key: SPARK-22707 > URL: https://issues.apache.org/jira/browse/SPARK-22707 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.2.0 > Reporter: Weichen Xu > > Via some test I found CrossValidator still exists memory issue, it will still > occupy `O(n*sizeof(model))` for holding models when fitting, if well > optimized, it should be `O(parallelism*sizeof(model))` > This is because modelFutures will hold the reference to model object after > future is complete (we can use `future.value.get.get` to fetch it), and the > `Future.sequence` and the `modelFutures` array holds references to each model > future. So all model object are keep referenced until `fit` return. So it > will still occupy `O(n*sizeof(model))` memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org