[ 
https://issues.apache.org/jira/browse/IMPALA-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-13249.
----------------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

Fixed by IMPALA-13756.

> COUNT(*) query optimisation for Iceberg V2 tables breaks UNION queries in 
> some cases
> ------------------------------------------------------------------------------------
>
>                 Key: IMPALA-13249
>                 URL: https://issues.apache.org/jira/browse/IMPALA-13249
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.4.0
>            Reporter: Aleksey
>            Assignee: Noémi Pap-Takács
>            Priority: Critical
>              Labels: impala-iceberg
>             Fix For: Impala 5.0.0
>
>
>  
> The problem occurs when _count_ is queried within a _UNION_ upon Iceberg V2 
> table values that had been inserted later on, when the table was already 
> populated with data.
> {code:java}
> create table prod_wrk.test_iceberg_v2
> (
>  id_field integer,
>  data_field string
> )
> stored as iceberg
> TBLPROPERTIES('format-version'='2');
> insert into prod_wrk.test_iceberg_v2 values (1, "A"), (2, "B"), (3, "C");
> -- Prepare table
> delete from prod_wrk.test_iceberg_v2
> where id_field = 1;
> -- Remove first record
> insert into prod_wrk.test_iceberg_v2 values (1, "D"), (4, "E");
> -- Insert new records with data_field values that were not present before
> select *
> from prod_wrk.test_iceberg_v2
> order by data_field;
> -- data_field = 'D' present
> {code}
>  
>  
> ||id_field ||data_field||
> |2|B|
> |3|C|
> |1|D|
> |4|E|
>  
>  
> At this point querying prod_wrk.test_iceberg_v2 is expected to yield one 
> value with _data_field = 'D'_ in any context
>  
> {code:java}
> select count(1)
> from prod_wrk.test_iceberg_v2
> union all
> select count(1)
> from prod_wrk.test_iceberg_v2
> where data_field = 'D';
> -- the result coming from count(1) with the WHERE clause is incorrect 
> (0){code}
>  
>  
> ||count(1)||
> |4|
> |0|
>  
> Expected result:
> ||count(1)||
> |4|
> |1|
>  
> However, altering the forementioned query for the COUNT optimization to come 
> off produces expected results:
>  
> {code:java}
> select count(1), sum(1)
> from prod_wrk.test_iceberg_v2
> union all
> select count(1), sum(1)
> from prod_wrk.test_iceberg_v2
> where data_field = 'D';
> -- results become correct{code}
>  
>  
> ||count(1)||sum(1)||
> |4|4|
> |1|1|
>  
>  
> {code:java}
> select count(1)
> from prod_wrk.test_iceberg_v2
> where data_field = 'D';
> -- result becomes correct as well{code}
>  
>  
> ||count(1)||
> |1|
>  
> Count optimization was introduced by IMPALA-11802



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to