- dev + user Can you give more info about the query? Maybe a full explain()? Are you using a datasource like JDBC? The API does not currently push down limits, but the documentation talks about how you can use a query instead of a table if that is what you are looking to do.
On Mon, Oct 24, 2016 at 5:40 AM, Liz Bai <liz...@icloud.com> wrote: > Hi all, > > Let me clarify the problem: > > Suppose we have a simple table `A` with 100 000 000 records > > Problem: > When we execute sql query ‘select * from A Limit 500`, > It scan through all 100 000 000 records. > Normal behaviour should be that once 500 records is found, engine stop > scanning. > > Detailed observation: > We found that there are “GlobalLimit / LocalLimit” physical operators > https://github.com/apache/spark/blob/branch-2.0/sql/ > core/src/main/scala/org/apache/spark/sql/execution/limit.scala > But during query plan generation, GlobalLimit / LocalLimit is not applied > to the query plan. > > Could you please help us to inspect LIMIT problem? > Thanks. > > Best, > Liz > > On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote: > > Hi, Liz, > > CollectLimit means `Take the first `limit` elements and collect them to a > single partition.` > > Thanks, > > Xiao > > 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com>: > >> Hi all, >> >> I found the runtime for query with or without “LIMIT” keyword is the >> same. We looked into it and found actually there is “GlobalLimit / >> LocalLimit” in logical plan, however no relevant physical plan there. Is >> this a bug or something else? Attached are the logical and physical plans >> when running "SELECT * FROM seq LIMIT 1". >> >> >> More specifically, We expected a early stop upon getting adequate results. >> Thanks so much. >> >> Best, >> Liz >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > >