Zoltán Borók-Nagy created IMPALA-13768:
------------------------------------------

             Summary: Redundant Iceberg delete records are shuffled around 
which cause error "Invalid file path arrived at builder"
                 Key: IMPALA-13768
                 URL: https://issues.apache.org/jira/browse/IMPALA-13768
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Zoltán Borók-Nagy
            Assignee: Zoltán Borók-Nagy


IcebergDeleteBuilder assumes that it should only receive delete records for 
paths of data files that are scheduled for its corresponding SCAN operator.

It is not true in the following cases:
 * single node plan is executed (no DIRECTED mode, no filtering)
 * number of output channels is 1 (again, no DIRECTED mode, no filtering)
 * bug in DIRECTED mode, see below

In KrpcDataStreamSender::Send(), variable 'skipped_prev_row' is never checked: 
https://github.com/apache/impala/blob/1b6395b8db09d271bd166bf501bdf7038d8be644/be/src/runtime/krpc-data-stream-sender.cc#L1174

Repro:
{noformat}
create table ice_invalid_deletes (bi bigint, year int)
partitioned by spec (year)
stored as iceberg tblproperties ('format-version'='2');

insert into ice_invalid_deletes select bigint_col, year from 
functional.alltypes where month = 10;

with v as (select max(bi) as max_bi from ice_invalid_deletes) insert into 
ice_invalid_deletes select bi + v.max_bi, year from v, ice_invalid_deletes;

delete from ice_invalid_deletes where bi % 11 = 0;

-- All the followings result in error:
-- single output channel
select count(*) from ice_invalid_deletes where year=2010 and bi = 180;
-- bug in KrpcDataStreamSender::Send
select count(*) from ice_invalid_deletes where year>2010 and bi = 180;
-- single node plan
set num_nodes=1;
select count(*) from ice_invalid_deletes where year>2010 and bi = 180;{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to