Liang-Chi Hsieh created SPARK-19274:
---------------------------------------
Summary: Make GlobalLimit without shuffling data to single
partition
Key: SPARK-19274
URL: https://issues.apache.org/jira/browse/SPARK-19274
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Liang-Chi Hsieh
A logical Limit is performed actually by two physical operations LocalLimit and
GlobalLimit.
In most of time, before GlobalLimit, we will perform a shuffle exchange to
shuffle data to single partition. When the limit number is not trivially small,
this shuffling is costing.
This change tried to perform GlobalLimit without shuffling data to single
partition. The approach is similar to SparkPlan.executeTake. It iterates part
of partitions until it reaches enough data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]