[ 
https://issues.apache.org/jira/browse/IMPALA-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-14330.
-------------------------------------
    Fix Version/s: Impala 5.0.0
       Resolution: Fixed

Resolving this. Thank [~rizaon] and [~MikaelSmith] for the review!

> Global INVALIDATE METADATA should set a valid createEventId instead of -1 on 
> all tables
> ---------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14330
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14330
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>         Attachments: 
> catalogd.ip-172-31-59-7.ubuntu.log.INFO.20250819-225701.1846090, 
> impalad.ip-172-31-59-7.ubuntu.log.WARNING.20250819-225701.1846122
>
>
> Similar to IMPALA-14307, using -1 as the createEventId leads to the table 
> being dropped by stale events, e.g. ALTER_TABLE RENAME events. We should 
> avoid doing so generally. Or at least in catalog reset (i.e. global 
> INVALIDATE METADATA) to make the tests in TestConcurrentDdls stable.
> One recent failure is 
> [https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409]
> {noformat}
> custom_cluster/test_concurrent_ddls.py:95: in 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap
>     self._run_ddls_with_invalidation(unique_database, sync_ddl=False)
> custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation
>     worker[i].get(timeout=100)
> ../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572:
>  in get
>     raise self._value
> E   AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed:
> E     AnalysisException: Table does not exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11
> E     
> E     
> E   assert <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>>('Query 
> 524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not 
> exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n',
>  False)
> E    +  where <bound method type.is_acceptable_error of <class 
> 'test_concurrent_ddls.TestConcurrentDdls'>> = 
> <test_concurrent_ddls.TestConcurrentDdls object at 
> 0x7ff4623e7d50>.is_acceptable_error{noformat}
> Checking the coordinator logs, it happens arround 22:57:13.984344
> {noformat}
> I20250819 22:57:13.984344 1847783 jni-util.cc:321] 
> 524559e2ab1655af:14cdb06300000000] 
> org.apache.impala.common.AnalysisException: Table does not exist: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> Checking the catalogd logs, EventProcessor is paused at event id 38012 by a 
> global reset. It has fetched a batch of 22 events starting from event id 
> 38013.
> {noformat}
> I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] 
> Received 22 events. First event id: 38013.
> I20250819 22:57:13.930258 1848179 JniUtil.java:167] 
> ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL 
> issued by ubuntu
> ...
> I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973] 
> ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced 
> event id is 38012{noformat}
> After the reset finishes, EventProecssor is started at event id 38034. But it 
> continues to processed the fetched batch. This is something we should fix.
> {noformat}
> I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062] 
> ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last 
> synced event id was updated from 38012 to 38034{noformat}
> While it processing event 38018, the log indicates the createEventId is -1:
> {noformat}
> I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018 
> Table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
> was not added since it already exists in catalog.
> ...
> W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 
> create event Id: -1 does not match the event id: 38018{noformat}
> This leads to the table being dropped later by event 38027 which we should 
> skip:
> {noformat}
> I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
>  set to: 38027
> I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for 
> table: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2
>  set from -1 to 38027
> I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027 
> EventType: ALTER_TABLE Target: 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2.
>  Removed table 
> test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat}
> The table is added by reset() which sets createEventId to -1 for all tables. 
> We should at least use the latest event id before reset() to avoid such issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to