Liang-Chi Hsieh created SPARK-19274:
---------------------------------------

             Summary: Make GlobalLimit without shuffling data to single 
partition
                 Key: SPARK-19274
                 URL: https://issues.apache.org/jira/browse/SPARK-19274
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Liang-Chi Hsieh


A logical Limit is performed actually by two physical operations LocalLimit and 
GlobalLimit.

In most of time, before GlobalLimit, we will perform a shuffle exchange to 
shuffle data to single partition. When the limit number is not trivially small, 
this shuffling is costing.

This change tried to perform GlobalLimit without shuffling data to single 
partition. The approach is similar to SparkPlan.executeTake. It iterates part 
of partitions until it reaches enough data.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to