Noemi Pap-Takacs created IMPALA-13173:
-----------------------------------------

             Summary: Redundant Catalog Update Check in Coordinator.
                 Key: IMPALA-13173
                 URL: https://issues.apache.org/jira/browse/IMPALA-13173
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, be
            Reporter: Noemi Pap-Takacs
            Assignee: Noemi Pap-Takacs


In case of DML operations, the Coordinator sends an update to the Catalog about 
the files changed in the table. Before sending the update, we check if any file 
was created. If no files were added or deleted, we skip the catalog update. See 
the logic in _'DmlExecState::PrepareCatalogUpdate'._

However, in case of unpartitioned Iceberg tables, the check in 
_'DmlExecState::PrepareCatalogUpdate'_ always returns true, and updates the 
Catalog even if no files were added. Currently, this does not cause incorrect 
behavior because it is double-checked later in client-request-state.cc.

On the other hand, there are cases, when not writing any files does not equal a 
NO-OP. For example overwriting a table with empty content or an OPTIMIZE TABLE 
that merges delete files. The Catalog needs to be informed about the changes in 
such cases.

We should filter NO-OP DMLs correctly in the Coordinator, eliminating false 
positive and false negative updates as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to