[ 
https://issues.apache.org/jira/browse/IMPALA-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18047474#comment-18047474
 ] 

Quanlong Huang commented on IMPALA-14637:
-----------------------------------------

Checked the Hive code, I think it's intended that DDLs like truncation 
operations don't have 
[WriteEventInfo|https://github.com/apache/hive/blob/e3cb93958ee470b261f44f99fc49716071f05b58/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L1136]
 which contains a list of data files. We also have comments mentioning this, 
e.g. 
[code1|https://github.com/apache/impala/blob/85d77b908b12ae3d3f48ed5d49f38fb3832edc4e/fe/src/main/java/org/apache/impala/catalog/Catalog.java#L99-L101],
 
[code2|https://github.com/apache/impala/blob/85d77b908b12ae3d3f48ed5d49f38fb3832edc4e/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L962-L964].

When processing COMMIT_TXN events, truncation operations are missing so the 
ValidWriteIds list and file metadata won't be updated. If the actual 
transaction has committed or aborted when Impala processes the ALTER event, the 
loaded file metadata is up-to-date. However, if the transaction is still open 
at that time, Impala loads a stale snapshot and needs to reload it when 
processing the COMMIT_TXN event.

Uploaded a fix for review: https://gerrit.cloudera.org/c/23805/

> test_event_based_replication is flaky for truncate table
> --------------------------------------------------------
>
>                 Key: IMPALA-14637
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14637
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: 
> catalogd.0beeab027c72.impala.log.INFO.20251218-020359.1.gz, 
> hadoop-hdfs.tar.gz, hive-metastore.log.gz, hive-server2.log.gz, 
> impalad.52d7008a6fb3.impala.log.INFO.20251218-020359.1_part.gz
>
>
> Saw this test failed again like IMPALA-12187: 
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/4377
> {code:python}
> metadata/test_event_processing.py:122: in test_event_based_replication
>     self._run_event_based_replication_tests_impl(self,
> metadata/test_event_processing_base.py:305: in 
> _run_event_based_replication_tests_impl
>     assert rows_in_part_tbl_target == 0
> E   assert 100 == 0{code}
> But the implementation is much different now that we have both the 
> hms_event_sync feature and hierarchical event processing enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to