Noemi Pap-Takacs created IMPALA-13173:
-----------------------------------------
Summary: Redundant Catalog Update Check in Coordinator.
Key: IMPALA-13173
URL: https://issues.apache.org/jira/browse/IMPALA-13173
Project: IMPALA
Issue Type: Bug
Components: Backend, be
Reporter: Noemi Pap-Takacs
Assignee: Noemi Pap-Takacs
In case of DML operations, the Coordinator sends an update to the Catalog about
the files changed in the table. Before sending the update, we check if any file
was created. If no files were added or deleted, we skip the catalog update. See
the logic in _'DmlExecState::PrepareCatalogUpdate'._
However, in case of unpartitioned Iceberg tables, the check in
_'DmlExecState::PrepareCatalogUpdate'_ always returns true, and updates the
Catalog even if no files were added. Currently, this does not cause incorrect
behavior because it is double-checked later in client-request-state.cc.
On the other hand, there are cases, when not writing any files does not equal a
NO-OP. For example overwriting a table with empty content or an OPTIMIZE TABLE
that merges delete files. The Catalog needs to be informed about the changes in
such cases.
We should filter NO-OP DMLs correctly in the Coordinator, eliminating false
positive and false negative updates as well.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)