Zoltán Borók-Nagy created IMPALA-13087:
------------------------------------------
Summary: DML operations on Iceberg tables should not write
positition delete files for data files that are completely removed
Key: IMPALA-13087
URL: https://issues.apache.org/jira/browse/IMPALA-13087
Project: IMPALA
Issue Type: Bug
Reporter: Zoltán Borók-Nagy
Users sometimes use the DELETE operation even in cases when they should use
TRUNCATE or DROP PARTITION. In those cases the DELETE will write way too many
position delete records that will hurt performance of subsequent queries. On
top of that, these delete records are unnecessary, because we should just
remove the corresponding data files from the new snapshot.
We need to smart up the DML operations to only write position delete records if
they don't delete whole files. The IcebergBufferedDeleteSink has the
FilePositions type:
https://github.com/apache/impala/blob/bbfba13ed4d084681b542d7c5e1b5156576a603b/be/src/exec/iceberg-buffered-delete-sink.h#L66
It is a mapping from data files to the positions we are about to delete. After
SortBufferedRecords() the positions are in order and there are no duplications.
Therefore if the pos_vector.back() == pos_vector.size() - 1, we now we are
about to delete a continuous range from 0 to N. At this point we need to look
up the number of records in the corresponding data file, and if the number of
records are N, we know we are about to delete a whole file.
In this case we shouldn't write the position delete records, but instead
register the data file in dml_exec_state_ for deletion.
Then in the IcebergCatalogOpExecutor we should use Iceberg's DeleteFiles to
remove the registered data files in the same Iceberg transaction with the
RowDelta operation.
The DELETE statement is the most critical here, but UPDATE and MERGE might also
benefit from this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)