[ https://issues.apache.org/jira/browse/SPARK-36419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-36419: ----------------------------------- Assignee: Aravind Patnam > Move final aggregation in RDD.treeAggregate to executor > ------------------------------------------------------- > > Key: SPARK-36419 > URL: https://issues.apache.org/jira/browse/SPARK-36419 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Tests > Affects Versions: 3.0.0 > Reporter: Aravind Patnam > Assignee: Aravind Patnam > Priority: Minor > Fix For: 3.3.0 > > > For the last iteration in RDD.treeAggregate, spark relies on RDD.fold as an > implementation detail. > RDD.fold pulls all shuffle partitions to the driver to merge the result. > There are two concerns with this: > a) Shuffle machinery at executors is much more robust/fault tolerant compared > to fetching results to driver. > b) Driver is single point of failure in a spark application. When this > results in nontrivial increase in memory pressure while pulling partitions to > driver or increased memory usage as part of computing the aggregated state > (in user code), it can result in driver failures. > For treeAggregate, instead of relying on fold for the last iteration, we > should (optionally) do the computation at a single reducer - and fetch the > final result to driver. > Additional cost: one extra stage with a single (resulting) partition. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org