The rule SpecialLimits converted GlobalLimit / LocalLimit to CollectLimitExec.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L74-L75 Spark will not scan all the records based on your plan. CollectLimitExec should behave as you expected. Thanks, Xiao 2016-10-23 20:40 GMT-07:00 Liz Bai <liz...@icloud.com>: > Hi all, > > Let me clarify the problem: > > Suppose we have a simple table `A` with 100 000 000 records > > Problem: > When we execute sql query ‘select * from A Limit 500`, > It scan through all 100 000 000 records. > Normal behaviour should be that once 500 records is found, engine stop > scanning. > > Detailed observation: > We found that there are “GlobalLimit / LocalLimit” physical operators > https://github.com/apache/spark/blob/branch-2.0/sql/ > core/src/main/scala/org/apache/spark/sql/execution/limit.scala > But during query plan generation, GlobalLimit / LocalLimit is not applied > to the query plan. > > Could you please help us to inspect LIMIT problem? > Thanks. > > Best, > Liz > > On 23 Oct 2016, at 10:11 PM, Xiao Li <gatorsm...@gmail.com> wrote: > > Hi, Liz, > > CollectLimit means `Take the first `limit` elements and collect them to a > single partition.` > > Thanks, > > Xiao > > 2016-10-23 5:21 GMT-07:00 Ran Bai <liz...@icloud.com>: > >> Hi all, >> >> I found the runtime for query with or without “LIMIT” keyword is the >> same. We looked into it and found actually there is “GlobalLimit / >> LocalLimit” in logical plan, however no relevant physical plan there. Is >> this a bug or something else? Attached are the logical and physical plans >> when running "SELECT * FROM seq LIMIT 1". >> >> >> More specifically, We expected a early stop upon getting adequate results. >> Thanks so much. >> >> Best, >> Liz >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > > >