[ https://issues.apache.org/jira/browse/SPARK-30713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-30713: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Respect mapOutputSize in memory in adaptive execution > ----------------------------------------------------- > > Key: SPARK-30713 > URL: https://issues.apache.org/jira/browse/SPARK-30713 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: liupengcheng > Priority: Major > > Currently, Spark adaptive execution use the MapOutputStatistics information > to adjust the plan dynamically, but this MapOutputSize does not respect the > compression factor. So there are cases that the original SparkPlan is > `SortMergeJoin`, but the Plan after adaptive adjustment was changed to > `BroadcastHashJoin`, but this `BroadcastHashJoin` might causing OOMs due to > inaccurate estimation. > > Also, if the shuffle implementation is local shuffle(intel Spark-Adaptive > execution impl), then in some cases, it will cause `Too large Frame` > exception. > > So I propose to respect the compression factor in adaptive execution, or use > `dataSize` metrics in `ShuffleExchangeExec` in adaptive execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org