[ https://issues.apache.org/jira/browse/SPARK-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099274#comment-14099274 ]
Patrick Wendell commented on SPARK-2916: ---------------------------------------- Just to document for posterity - this was narrowed down and is just a symptom of SPARK-3015. > [MLlib] While running regression tests with dense vectors of length greater > than 1000, the treeAggregate blows up after several iterations > ------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-2916 > URL: https://issues.apache.org/jira/browse/SPARK-2916 > Project: Spark > Issue Type: Bug > Components: MLlib, Spark Core > Reporter: Burak Yavuz > Priority: Blocker > > While running any of the regression algorithms with gradient descent, the > treeAggregate blows up after several iterations. > Observed on EC2 cluster with 16 nodes, matrix dimensions of 1,000,000 x 5,000 > In order to replicate the problem, use aggregate multiple times, maybe over > 50-60 times. > Testing lead to the possible workaround: > setting > `spark.cleaner.referenceTracking false` > seems to help. So the problem is most probably related to the cleanup. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org