Zoltán Borók-Nagy created IMPALA-13768:
------------------------------------------
Summary: Redundant Iceberg delete records are shuffled around
which cause error "Invalid file path arrived at builder"
Key: IMPALA-13768
URL: https://issues.apache.org/jira/browse/IMPALA-13768
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Zoltán Borók-Nagy
Assignee: Zoltán Borók-Nagy
IcebergDeleteBuilder assumes that it should only receive delete records for
paths of data files that are scheduled for its corresponding SCAN operator.
It is not true in the following cases:
* single node plan is executed (no DIRECTED mode, no filtering)
* number of output channels is 1 (again, no DIRECTED mode, no filtering)
* bug in DIRECTED mode, see below
In KrpcDataStreamSender::Send(), variable 'skipped_prev_row' is never checked:
https://github.com/apache/impala/blob/1b6395b8db09d271bd166bf501bdf7038d8be644/be/src/runtime/krpc-data-stream-sender.cc#L1174
Repro:
{noformat}
create table ice_invalid_deletes (bi bigint, year int)
partitioned by spec (year)
stored as iceberg tblproperties ('format-version'='2');
insert into ice_invalid_deletes select bigint_col, year from
functional.alltypes where month = 10;
with v as (select max(bi) as max_bi from ice_invalid_deletes) insert into
ice_invalid_deletes select bi + v.max_bi, year from v, ice_invalid_deletes;
delete from ice_invalid_deletes where bi % 11 = 0;
-- All the followings result in error:
-- single output channel
select count(*) from ice_invalid_deletes where year=2010 and bi = 180;
-- bug in KrpcDataStreamSender::Send
select count(*) from ice_invalid_deletes where year>2010 and bi = 180;
-- single node plan
set num_nodes=1;
select count(*) from ice_invalid_deletes where year>2010 and bi = 180;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)