[jira] [Closed] (FLINK-22143) Flink returns less rows than expected when using limit in SQL

Kurt Young (Jira) Thu, 08 Apr 2021 20:46:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-22143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kurt Young closed FLINK-22143.
------------------------------
    Resolution: Fixed

fixed: 3f4dd8229436ad2612df820f1bd83cc35e6325ac

> Flink returns less rows than expected when using limit in SQL
> -------------------------------------------------------------
>
>                 Key: FLINK-22143
>                 URL: https://issues.apache.org/jira/browse/FLINK-22143
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Table SQL / Runtime
>    Affects Versions: 1.13.0
>            Reporter: Peng Yu
>            Assignee: Peng Yu
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.13.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Flink's blink runtime returns less rows than expected when querying Hive 
> tables with limit.
> {code:java}
> // sql
> select i_item_sk from tpcds_1g_snappy.item limit 5000;
> {code}
>  
> Above query will return only *4998* lines in some cases.
>  
> This problem can be re-produced on below conditions:
>  # A Hive table with parquet format.
>  # Running SQL with limit using blink planner since Flink version 1.12.0
>  # The input table is small. (With only 1 data file in which there is only 1 
> row group, e.g. 1 GB of TPCDS benchmark data)
>  # The requested count of lines by `limit` is above the batch size (2048 by 
> default)
>  
> After investigation, a bug is found lying in the *LimitableBulkFormat* class.
> In this class, for each batch, *numRead* will be increased *1* more than 
> actual count of rows returned by reader.readBatch().
> The reason is that *numRead* get increased even when next() reaches then end 
> of current batch.
> If there is only 1 input split, no more lines will be merged into the final 
> result. 
> As a result, less lines will be returned by Flink.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (FLINK-22143) Flink returns less rows than expected when using limit in SQL

Reply via email to