Re: LIMIT issue of SparkSQL

Xiao Li Sun, 23 Oct 2016 21:37:45 -0700

The rule SpecialLimits converted GlobalLimit / LocalLimit
to CollectLimitExec.


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L74-L75

Spark will not scan all the records based on your plan. CollectLimitExec
should behave as you expected.

Thanks,

Xiao





2016-10-23 20:40 GMT-07:00 Liz Bai <liz...@icloud.com>:

> Hi all,
>
> Let me clarify the problem:
>
> Suppose we have a simple table `A` with 100 000 000 records
>
> Problem:
> When we execute sql query ‘select * from A Limit 500`,
> It scan through all 100 000 000 records.
> Normal behaviour should be that once 500 records is found, engine stop
> scanning.
>
> Detailed observation:
> We found that there are “GlobalLimit / LocalLimit” physical operators
> https://github.com/apache/spark/blob/branch-2.0/sql/
> core/src/main/scala/org/apache/spark/sql/execution/limit.scala
> But during query plan generation, GlobalLimit / LocalLimit is not applied
> to the query plan.
>
> Could you please help us to inspect LIMIT problem?
> Thanks.
>
> Best,
> Liz
>
> On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote:
>
> Hi, Liz,
>
> CollectLimit means `Take the first `limit` elements and collect them to a
> single partition.`
>
> Thanks,
>
> Xiao
>
> 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com>:
>
>> Hi all,
>>
>> I found the runtime for query with or without “LIMIT” keyword is the
>> same. We looked into it and found actually there is “GlobalLimit /
>> LocalLimit” in logical plan, however no relevant physical plan there. Is
>> this a bug or something else? Attached are the logical and physical plans
>> when running "SELECT * FROM seq LIMIT 1".
>>
>>
>> More specifically, We expected a early stop upon getting adequate results.
>> Thanks so much.
>>
>> Best,
>> Liz
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>
>

Re: LIMIT issue of SparkSQL

Reply via email to