Sorry for the typo in last mail.
Compared with the Query-2, we have questions in Query-1 and Query-3. 
Also, may I know the difference between CollectLimit and BaseLimit?
Thanks so much.

Best,
Liz
> On 26 Oct 2016, at 7:25 PM, Liz Bai <liz...@icloud.com> wrote:
> 
> Hi all,
> 
> We used Parquet and Spark 2.0 to do the testing. The table below is the 
> summary of what we have found about `Limit` keyword. Query-2 reveals that 
> SparkSQL does early stop upon getting adequate results. But we are curious of 
> Query-1 and Query-2.
*But we are curious of Query-1 and Query-3.
> It seems that, either writing result RDD as Parquet or filtering on columns 
> will lead to scanning much more data.
> No.
> SQL statement
> Filter
> Method of saving result
> Runtime(s)
> Input data size
> 1
> select ColA from Table limit 1
> no
> writeParquet
> 216
> 205MB
> 2
> select ColA from Table limit 1
> no
> Collect
> 22
> 38.3KB
> 3
> select ColA from Table where ColB = 50 limit 1
> yes
> Collect
> 229
> 1776.4MB
> We are wondering if this is a bug or something else. Could you please help on 
> it?
> Thanks.
> 
> Best regards,
> Liz

Reply via email to