[
https://issues.apache.org/jira/browse/IMPALA-14637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046591#comment-18046591
]
Quanlong Huang commented on IMPALA-14637:
-----------------------------------------
The test truncates the table in source_db in Impala. Then run REPL DUMP + REPL
LOAD in Hive to truncate the table in target_db. Finally verifying the table is
empty in Impala failed.
The REPL LOAD command finishes at 03:01:39,567 and the failed Impala query
starts at 03:01:40,407:
{noformat}
repl load repl_source_ozsye into repl_target_wkunk
-- 2025-12-18 03:01:39,567 INFO MainThread: hs2: executing against Impala
at localhost:21050. session: ea4975a16d9cc691:da8734908144f3ae main_cursor:
True user: None
select count(*) from repl_target_wkunk.unpart_tbl; // This passes. Result is 0.
...
select count(*) from repl_target_wkunk.part_tbl; // This fails. Result is not
0.
-- 2025-12-18 03:01:40,407 INFO MainThread:
504c3091868ad1d4:0c00f08300000000: query started{noformat}
In catalogd logs, query waits enough time until the events are processed:
{noformat}
I20251218 03:01:40.339617 693 CatalogServiceCatalog.java:4713]
504c3091868ad1d4:0c00f08300000000] waitForHmsEvent request:
want_minimal_response=true, coordinator=192.168.124.4, timeout_s=10,
want_db_list=false, want_table_list=false,
objects=[TABLE:repl_target_wkunk.part_tbl, TABLE:default.repl_target_wkunk,
DATABASE:default, DATABASE:repl_target_wkunk], should_expand_views=true
...
I20251218 03:01:40.349382 693 MetastoreEventsProcessor.java:1944]
504c3091868ad1d4:0c00f08300000000] Last synced event id (47169) reached 47169
I20251218 03:01:40.349473 693 CatalogServiceCatalog.java:4790]
504c3091868ad1d4:0c00f08300000000] waitForHmsEvent succeeds.
updated_objects=[TABLE:repl_target_wkunk.part_tbl(55900),
DATABASE:default(42206), DATABASE:repl_target_wkunk(55905)],
removed_objects=[]{noformat}
The last time that catalogd reloads file metadata of the table is
03:01:38.367844
{noformat}
I20251218 03:01:38.353439 160 HdfsTable.java:1256] Reloading metadata for
table definition and all partition(s) of repl_target_wkunk.part_tbl
(ALTER_BATCH_PARTITIONS 2 events from 47101 to 47102)
I20251218 03:01:38.354635 160 HdfsTable.java:1923] Loaded 13 columns from
HMS. Actual columns: 13
I20251218 03:01:38.361160 160 HdfsTable.java:2970] Valid writeIds of table
repl_target_wkunk.part_tbl: repl_target_wkunk.part_tbl:4:4:4:
I20251218 03:01:38.361230 160 HdfsTable.java:3003] Load Valid Write Id List
Done. Time taken: 1.785ms
I20251218 03:01:38.363641 160 ParallelFileMetadataLoader.java:230] Refreshing
file and block metadata for 4 paths for table repl_target_wkunk.part_tbl using
a thread pool of size 4
I20251218 03:01:38.367844 160 HdfsTable.java:781] Loaded file and block
metadata for repl_target_wkunk.part_tbl partitions: year=2009/month=4,
year=2009/month=3, year=2009/month=1, and 1 others. Time taken:
5.679ms{noformat}
The valid writeIds shows that writeId 4 is still open:
repl_target_wkunk.part_tbl: repl_target_wkunk.part_tbl:4:4:4:
So it's expected that when processing the COMMIT_TXN event, catalogd should
reload file metadata modified by this writeId. However, catalogd logs show that
this didn't happen:
{noformat}
I20251218 03:01:39.249342 160 CatalogOpExecutor.java:5549] Not reloading
partitions of table repl_target_wkunk.part_tbl for event 47109 since the cache
is already up-to-date
I20251218 03:01:39.249399 160 CatalogServiceCatalog.java:4808] Trying to add
COMMITTED write ids [4] to table repl_target_wkunk.part_tbl for event 47109
I20251218 03:01:39.249475 160 HdfsTable.java:3095] Setting the catalog
version of HdfsTable@2df58760 repl_target_wkunk.part_tbl to 55900
I20251218 03:01:39.249501 160 CatalogServiceCatalog.java:4883] Added
COMMITTED writeId to table repl_target_wkunk.part_tbl: [4] for event 47109
I20251218 03:01:39.249531 160 MetastoreEvents.java:936] EventId: 47109
EventType: COMMIT_TXN Target: repl_target_wkunk.part_tbl. Scheduling delay:
31ms, Process time:
I20251218 03:01:39.249559 160 MetastoreEvents.java:947] EventId: 47109
EventType: COMMIT_TXN Target: CLUSTER_WIDE. Complete process time:
31ms{noformat}
In a passed run of the test, the valid writeIds loaded in the last file
metadata loading is "<table_name>:4:9223372036854775807::". So it won't be
impacted by this COMMIT_TXN event.
Tracking the code, it seems HMS returns an empty WriteEventInfo list for this
transaction here.
[https://github.com/apache/impala/blob/1684c2d9da20cd27dcfc30dbaf940ac8144b496f/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java#L1193-L1194]
CC [~VenuReddy], [~hemanth619] for more thoughts.
> test_event_based_replication is flaky for truncate table
> --------------------------------------------------------
>
> Key: IMPALA-14637
> URL: https://issues.apache.org/jira/browse/IMPALA-14637
> Project: IMPALA
> Issue Type: Bug
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
>
> Saw this test failed again like IMPALA-12187:
> https://jenkins.impala.io/job/ubuntu-20.04-dockerised-tests/4377
> {code:python}
> metadata/test_event_processing.py:122: in test_event_based_replication
> self._run_event_based_replication_tests_impl(self,
> metadata/test_event_processing_base.py:305: in
> _run_event_based_replication_tests_impl
> assert rows_in_part_tbl_target == 0
> E assert 100 == 0{code}
> But the implementation is much different now that we have both the
> hms_event_sync feature and hierarchical event processing enabled.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]