Re: LIMIT issue of SparkSQL

Michael Armbrust Sun, 23 Oct 2016 22:49:33 -0700

- dev + user

Can you give more info about the query?  Maybe a full explain()?  Are you
using a datasource like JDBC?  The API does not currently push down limits,
but the documentation talks about how you can use a query instead of a
table if that is what you are looking to do.


On Mon, Oct 24, 2016 at 5:40 AM, Liz Bai <liz...@icloud.com> wrote:

> Hi all,
>
> Let me clarify the problem:
>
> Suppose we have a simple table `A` with 100 000 000 records
>
> Problem:
> When we execute sql query ‘select * from A Limit 500`,
> It scan through all 100 000 000 records.
> Normal behaviour should be that once 500 records is found, engine stop
> scanning.
>
> Detailed observation:
> We found that there are “GlobalLimit / LocalLimit” physical operators
> https://github.com/apache/spark/blob/branch-2.0/sql/
> core/src/main/scala/org/apache/spark/sql/execution/limit.scala
> But during query plan generation, GlobalLimit / LocalLimit is not applied
> to the query plan.
>
> Could you please help us to inspect LIMIT problem?
> Thanks.
>
> Best,
> Liz
>
> On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote:
>
> Hi, Liz,
>
> CollectLimit means `Take the first `limit` elements and collect them to a
> single partition.`
>
> Thanks,
>
> Xiao
>
> 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com>:
>
>> Hi all,
>>
>> I found the runtime for query with or without “LIMIT” keyword is the
>> same. We looked into it and found actually there is “GlobalLimit /
>> LocalLimit” in logical plan, however no relevant physical plan there. Is
>> this a bug or something else? Attached are the logical and physical plans
>> when running "SELECT * FROM seq LIMIT 1".
>>
>>
>> More specifically, We expected a early stop upon getting adequate results.
>> Thanks so much.
>>
>> Best,
>> Liz
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>
>

Re: LIMIT issue of SparkSQL

Reply via email to