Aleksey created IMPALA-13249:
--------------------------------
Summary: COUNT(*) query optimisation for Iceberg V2 tables breaks
UNION queries in some cases
Key: IMPALA-13249
URL: https://issues.apache.org/jira/browse/IMPALA-13249
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.4.0
Reporter: Aleksey
The problem occurs when count(*) is queried within a UNION upon Iceberg V2
table values that had been inserted later on, when the table was already
populated with data.
{code:java}
create table prod_wrk.test_iceberg_v2
(
id_field integer,
data_field string
)
stored as iceberg
TBLPROPERTIES('format-version'='2');
insert into prod_wrk.test_iceberg_v2 values (1, "A"), (2, "B"), (3, "C");
-- Prepare table
delete from prod_wrk.test_iceberg_v2
where id_field = 1;
-- Remove first record
insert into prod_wrk.test_iceberg_v2 values (1, "D"), (4, "E");
-- Insert new records with data_field values that were not present before
select *
from prod_wrk.test_iceberg_v2
order by data_field;
-- data_field = 'D' present
{code}
||id_field ||data_field||
|2|B|
|3|C|
|1|D|
|4|E|
At this point querying prod_wrk.test_iceberg_v2 is expected to yield one value
with _data_field = 'D'_ in any context
{code:java}
select count(1)
from prod_wrk.test_iceberg_v2
union all
select count(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- the result coming from count(1) with the WHERE clause is incorrect (0){code}
||count(1)||
|4|
|0|
Expected result:
||count(1)||
|4|
|1|
However, altering the forementioned query for the COUNT optimization to come
off produces expected results:
{code:java}
select count(1), sum(1)
from prod_wrk.test_iceberg_v2
union all
select count(1), sum(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- results become correct{code}
||count(1)||sum(1)||
|4|4|
|1|1|
{code:java}
select count(1)
from prod_wrk.test_iceberg_v2
where data_field = 'D';
-- result becomes correct as well{code}
||count(1)||
|1|
Count optimization was introduced by
[IMPALA-11802|https://issues.apache.org/jira/browse/IMPALA-11802]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)