[ 
https://issues.apache.org/jira/browse/SPARK-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10629.
-------------------------------
    Resolution: Duplicate

> Gradient boosted trees: mapPartitions input size increasing 
> ------------------------------------------------------------
>
>                 Key: SPARK-10629
>                 URL: https://issues.apache.org/jira/browse/SPARK-10629
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>    Affects Versions: 1.4.1
>            Reporter: Wenmin Wu
>
> First of all, I think my problem is quite different from 
> https://issues.apache.org/jira/browse/SPARK-10433, which point that the input 
> size increasing at each iteration.
> My problem is the mapPartitions input size increase in one iteration. My 
> training samples has 2958359 features in total. Within one iteration, 3 
> collectAsMap operation had been called. And here is a summary of each call.
> | Stage Id |               Description                                | 
> Duration  |   Input    | Shuffle Read | Shuffle Write |
> |:----------:|:---------------------------------------------------:|:-----------:|:-----------:|:----------------:|:----------------:|
> |      4      | mapPartitions at DecisionTree.scala:613 |  1.6 h      |710.2 
> MB |             |       2.8 GB       |
> |      5      | collectAsMap at DecisionTree.scala:642  |  1.8 min  |         
>        |        2.8 GB        |                      |
> |      6      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 27.0 
> GB  |        |          5.6 GB |
> |      7      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |   |    
> 5.6GB       |          |
> |      8      | mapPartitions at DecisionTree.scala:613 |  1.2 h      | 26.5 
> GB  |        |           11.1 GB |
> |      9      | collectAsMap at DecisionTree.scala:642 | 2.0 min     |  |    
> 8.3 GB      |          |
> the mapPartitions operation took too long time! It's so strange! I wonder 
> whether there is bug exits?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to