Aleksey created IMPALA-13249:
--------------------------------

             Summary: COUNT(*) query optimisation for Iceberg V2 tables breaks 
UNION queries in some cases
                 Key: IMPALA-13249
                 URL: https://issues.apache.org/jira/browse/IMPALA-13249
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.4.0
            Reporter: Aleksey


 

The problem occurs when count(*) is queried within a UNION upon Iceberg V2 
table values that had been inserted later on, when the table was already 
populated with data.
{code:java}
create table prod_wrk.test_iceberg_v2
(
 id_field integer,
 data_field string
)
stored as iceberg
TBLPROPERTIES('format-version'='2');
insert into prod_wrk.test_iceberg_v2 values (1, "A"), (2, "B"), (3, "C");
-- Prepare table
delete from prod_wrk.test_iceberg_v2
where id_field = 1;
-- Remove first record
insert into prod_wrk.test_iceberg_v2 values (1, "D"), (4, "E");
-- Insert new records with data_field values that were not present before
select *
from prod_wrk.test_iceberg_v2
order by data_field;
-- data_field = 'D' present
{code}
 

 
||id_field ||data_field||
|2|B|
|3|C|
|1|D|
|4|E|

 

 

At this point querying prod_wrk.test_iceberg_v2 is expected to yield one value 
with _data_field = 'D'_ in any context

 
{code:java}
select count(1)
from prod_wrk.test_iceberg_v2
union all
select count(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- the result coming from count(1) with the WHERE clause is incorrect (0){code}
 

 
||count(1)||
|4|
|0|

 

Expected result:
||count(1)||
|4|
|1|

 

However, altering the forementioned query for the COUNT optimization to come 
off produces expected results:

 
{code:java}
select count(1), sum(1)
from prod_wrk.test_iceberg_v2
union all
select count(1), sum(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- results become correct{code}
 

 
||count(1)||sum(1)||
|4|4|
|1|1|

 

 
{code:java}
select count(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- result becomes correct as well{code}
 

 
||count(1)||
|1|

 


Count optimization was introduced by 
[IMPALA-11802|https://issues.apache.org/jira/browse/IMPALA-11802]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to