[ https://issues.apache.org/jira/browse/SPARK-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-10629. ------------------------------- Resolution: Duplicate > Gradient boosted trees: mapPartitions input size increasing > ------------------------------------------------------------ > > Key: SPARK-10629 > URL: https://issues.apache.org/jira/browse/SPARK-10629 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 1.4.1 > Reporter: Wenmin Wu > > First of all, I think my problem is quite different from > https://issues.apache.org/jira/browse/SPARK-10433, which point that the input > size increasing at each iteration. > My problem is the mapPartitions input size increase in one iteration. My > training samples has 2958359 features in total. Within one iteration, 3 > collectAsMap operation had been called. And here is a summary of each call. > | Stage Id | Description | > Duration | Input | Shuffle Read | Shuffle Write | > |:----------:|:---------------------------------------------------:|:-----------:|:-----------:|:----------------:|:----------------:| > | 4 | mapPartitions at DecisionTree.scala:613 | 1.6 h |710.2 > MB | | 2.8 GB | > | 5 | collectAsMap at DecisionTree.scala:642 | 1.8 min | > | 2.8 GB | | > | 6 | mapPartitions at DecisionTree.scala:613 | 1.2 h | 27.0 > GB | | 5.6 GB | > | 7 | collectAsMap at DecisionTree.scala:642 | 2.0 min | | > 5.6GB | | > | 8 | mapPartitions at DecisionTree.scala:613 | 1.2 h | 26.5 > GB | | 11.1 GB | > | 9 | collectAsMap at DecisionTree.scala:642 | 2.0 min | | > 8.3 GB | | > the mapPartitions operation took too long time! It's so strange! I wonder > whether there is bug exits? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org