Hi all, Let me clarify the problem:
Suppose we have a simple table `A` with 100 000 000 records Problem: When we execute sql query ‘select * from A Limit 500`, It scan through all 100 000 000 records. Normal behaviour should be that once 500 records is found, engine stop scanning. Detailed observation: We found that there are “GlobalLimit / LocalLimit” physical operators https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala <https://github.com/apache/spark/blob/branch-2.0/sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala> But during query plan generation, GlobalLimit / LocalLimit is not applied to the query plan. Could you please help us to inspect LIMIT problem? Thanks. Best, Liz > On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote: > > Hi, Liz, > > CollectLimit means `Take the first `limit` elements and collect them to a > single partition.` > > Thanks, > > Xiao > > 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com > <mailto:liz...@icloud.com>>: > Hi all, > > I found the runtime for query with or without “LIMIT” keyword is the same. We > looked into it and found actually there is “GlobalLimit / LocalLimit” in > logical plan, however no relevant physical plan there. Is this a bug or > something else? Attached are the logical and physical plans when running > "SELECT * FROM seq LIMIT 1". > > > More specifically, We expected a early stop upon getting adequate results. > Thanks so much. > > Best, > Liz > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> >