[ https://issues.apache.org/jira/browse/SPARK-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mridul Muralidharan updated SPARK-6165: --------------------------------------- Summary: Aggregate and reduce should be able to work with very large number of tasks. (was: Aggregate and reduce should spool to disk and complete) > Aggregate and reduce should be able to work with very large number of tasks. > ---------------------------------------------------------------------------- > > Key: SPARK-6165 > URL: https://issues.apache.org/jira/browse/SPARK-6165 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.4.0 > Reporter: Mridul Muralidharan > Priority: Minor > > To prevent data from workers causing OOM at master, we have the property > 'spark.driver.maxResultSize'. > But the OOM at master can be due to two reasons : > a) Data being sent from workers is too large - causing OOM at master. > b) Large number of moderate (to low) sized data being sent to master causing > OOM. > (For example: 500k tasks, 1k each) > spark.driver.maxResultSize protects against both - but (b) should be handled > more gracefully by master : example spool it to disk, aggregate without > waiting for entire result set to be fetched, etc. > Currently we are forced to use treeReduce and co to work around this problem > : adding to the latency of jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org