[ https://issues.apache.org/jira/browse/SPARK-25947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-25947. --------------------------------- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22961 [https://github.com/apache/spark/pull/22961] > Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns > ----------------------------------------------------------------------------- > > Key: SPARK-25947 > URL: https://issues.apache.org/jira/browse/SPARK-25947 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.2 > Reporter: Shuheng Dai > Priority: Major > Fix For: 3.0.0 > > > When sorting rows, ShuffleExchangeExec uses the entire row instead of just > the columns references in SortOrder to create the RangePartitioner. This > causes the RangePartitioner to sample entire rows to create rangeBounds and > can cause OOM issues on the driver when rows contain large fields. > Create a projection and only use columns involved in the SortOrder for the > RangePartitioner -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org