[jira] [Commented] (SPARK-22707) Optimize Crossvalidator fitting memory occupation by models

Apache Spark (JIRA) Tue, 05 Dec 2017 19:20:00 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279588#comment-16279588
 ]


Apache Spark commented on SPARK-22707:
--------------------------------------

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/19904

> Optimize Crossvalidator fitting memory occupation by models
> -----------------------------------------------------------
>
>                 Key: SPARK-22707
>                 URL: https://issues.apache.org/jira/browse/SPARK-22707
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Weichen Xu
>
> Via some test I found CrossValidator still exists memory issue, it will still 
> occupy `O(n*sizeof(model))` for holding models when fitting, if well 
> optimized, it should be `O(parallelism*sizeof(model))`
> This is because modelFutures will hold the reference to model object after 
> future is complete (we can use `future.value.get.get` to fetch it), and the 
> `Future.sequence` and the `modelFutures` array holds references to each model 
> future. So all model object are keep referenced until `fit` return. So it 
> will still occupy `O(n*sizeof(model))` memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22707) Optimize Crossvalidator fitting memory occupation by models

Reply via email to