[
https://issues.apache.org/jira/browse/IMPALA-13768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930175#comment-17930175
]
ASF subversion and git services commented on IMPALA-13768:
----------------------------------------------------------
Commit e4e80edef9bf3bf22eb5621d41e9995c30e305f8 in impala's branch
refs/heads/branch-4.5.0 from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e4e80edef ]
IMPALA-13768: Redundant Iceberg delete records are shuffled around which cause
error "Invalid file path arrived at builder"
IcebergDeleteBuilder assumes that it should only receive delete
records for paths of data files that are scheduled for its
corresponding SCAN operator.
It is not true when any of the following happens:
* number of output channels in sender is 1
(currently no DIRECTED mode, no filtering)
* hit bug in DIRECTED mode, see below
* single node plan is being used (no DIRECTED mode, no filtering)
With this patch, KrpcDataStreamSender::Send() will use DIRECTED mode
even if number of output channels is 1. It also fixes the bug in
DIRECTED mode (which was due to an unused variable 'skipped_prev_row')
and simplified the logic a bit.
The patch also relaxes the assumption in IcebergDeleteBuilder, i.e.
only return error for dangling delete records when we are in a
distributed plan where we can assume DIRECTED distribution mode of
position delete records.
Testing
* added e2e tests
Change-Id: I695c919c9a74edec768e413a02b2ef7dbfa0d6a5
Reviewed-on: http://gerrit.cloudera.org:8080/22500
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Redundant Iceberg delete records are shuffled around which cause error
> "Invalid file path arrived at builder"
> -------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-13768
> URL: https://issues.apache.org/jira/browse/IMPALA-13768
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> IcebergDeleteBuilder assumes that it should only receive delete records for
> paths of data files that are scheduled for its corresponding SCAN operator.
> It is not true in the following cases:
> * single node plan is executed (no DIRECTED mode, no filtering)
> * number of output channels is 1 (again, no DIRECTED mode, no filtering)
> * bug in DIRECTED mode, see below
> In KrpcDataStreamSender::Send(), variable 'skipped_prev_row' is never
> checked:
> [https://github.com/apache/impala/blob/1b6395b8db09d271bd166bf501bdf7038d8be644/be/src/runtime/krpc-data-stream-sender.cc#L1174]
> Repro:
> {noformat}
> create table ice_invalid_deletes (bi bigint, year int)
> partitioned by spec (year)
> stored as iceberg tblproperties ('format-version'='2');
> insert into ice_invalid_deletes select bigint_col, year from
> functional.alltypes where month = 10;
> with v as (select max(bi) as max_bi from ice_invalid_deletes) insert into
> ice_invalid_deletes select bi + v.max_bi, year from v, ice_invalid_deletes;
> delete from ice_invalid_deletes where bi % 11 = 0;
> -- All the followings result in error:
> -- single output channel
> select count(*) from ice_invalid_deletes where year=2010 and bi = 180;
> -- bug in KrpcDataStreamSender::Send
> select count(*) from ice_invalid_deletes where year>2000 and bi = 180;
> -- single node plan
> set num_nodes=1;
> select count(*) from ice_invalid_deletes where year>2000 and bi =
> 180;{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]