[
https://issues.apache.org/jira/browse/IMPALA-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltán Borók-Nagy resolved IMPALA-13768.
----------------------------------------
Fix Version/s: Impala 4.5.0
Resolution: Fixed
> Redundant Iceberg delete records are shuffled around which cause error
> "Invalid file path arrived at builder"
> -------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-13768
> URL: https://issues.apache.org/jira/browse/IMPALA-13768
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> IcebergDeleteBuilder assumes that it should only receive delete records for
> paths of data files that are scheduled for its corresponding SCAN operator.
> It is not true in the following cases:
> * single node plan is executed (no DIRECTED mode, no filtering)
> * number of output channels is 1 (again, no DIRECTED mode, no filtering)
> * bug in DIRECTED mode, see below
> In KrpcDataStreamSender::Send(), variable 'skipped_prev_row' is never
> checked:
> [https://github.com/apache/impala/blob/1b6395b8db09d271bd166bf501bdf7038d8be644/be/src/runtime/krpc-data-stream-sender.cc#L1174]
> Repro:
> {noformat}
> create table ice_invalid_deletes (bi bigint, year int)
> partitioned by spec (year)
> stored as iceberg tblproperties ('format-version'='2');
> insert into ice_invalid_deletes select bigint_col, year from
> functional.alltypes where month = 10;
> with v as (select max(bi) as max_bi from ice_invalid_deletes) insert into
> ice_invalid_deletes select bi + v.max_bi, year from v, ice_invalid_deletes;
> delete from ice_invalid_deletes where bi % 11 = 0;
> -- All the followings result in error:
> -- single output channel
> select count(*) from ice_invalid_deletes where year=2010 and bi = 180;
> -- bug in KrpcDataStreamSender::Send
> select count(*) from ice_invalid_deletes where year>2000 and bi = 180;
> -- single node plan
> set num_nodes=1;
> select count(*) from ice_invalid_deletes where year>2000 and bi =
> 180;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)