[ https://issues.apache.org/jira/browse/SPARK-25947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25947: ------------------------------------ Assignee: Apache Spark > Reduce memory usage in ShuffleExchangeExec by selecting only the sort columns > ----------------------------------------------------------------------------- > > Key: SPARK-25947 > URL: https://issues.apache.org/jira/browse/SPARK-25947 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.2 > Reporter: Shuheng Dai > Assignee: Apache Spark > Priority: Major > > When sorting rows, ShuffleExchangeExec uses the entire row instead of just > the columns references in SortOrder to create the RangePartitioner. This > causes the RangePartitioner to sample entire rows to create rangeBounds and > can cause OOM issues on the driver when rows contain large fields. > Create a projection and only use columns involved in the SortOrder for the > RangePartitioner -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org