[ https://issues.apache.org/jira/browse/IMPALA-14330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-14330. ------------------------------------- Fix Version/s: Impala 5.0.0 Resolution: Fixed Resolving this. Thank [~rizaon] and [~MikaelSmith] for the review! > Global INVALIDATE METADATA should set a valid createEventId instead of -1 on > all tables > --------------------------------------------------------------------------------------- > > Key: IMPALA-14330 > URL: https://issues.apache.org/jira/browse/IMPALA-14330 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > Fix For: Impala 5.0.0 > > Attachments: > catalogd.ip-172-31-59-7.ubuntu.log.INFO.20250819-225701.1846090, > impalad.ip-172-31-59-7.ubuntu.log.WARNING.20250819-225701.1846122 > > > Similar to IMPALA-14307, using -1 as the createEventId leads to the table > being dropped by stale events, e.g. ALTER_TABLE RENAME events. We should > avoid doing so generally. Or at least in catalog reset (i.e. global > INVALIDATE METADATA) to make the tests in TestConcurrentDdls stable. > One recent failure is > [https://jenkins.impala.io/job/ubuntu-20.04-from-scratch/6409] > {noformat} > custom_cluster/test_concurrent_ddls.py:95: in > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap > self._run_ddls_with_invalidation(unique_database, sync_ddl=False) > custom_cluster/test_concurrent_ddls.py:179: in _run_ddls_with_invalidation > worker[i].get(timeout=100) > ../toolchain/toolchain-packages-gcc10.4.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:572: > in get > raise self._value > E AssertionError: Query 524559e2ab1655af:14cdb06300000000 failed: > E AnalysisException: Table does not exist: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 > E > E > E assert <bound method type.is_acceptable_error of <class > 'test_concurrent_ddls.TestConcurrentDdls'>>('Query > 524559e2ab1655af:14cdb06300000000 failed:\nAnalysisException: Table does not > exist: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11\n\n', > False) > E + where <bound method type.is_acceptable_error of <class > 'test_concurrent_ddls.TestConcurrentDdls'>> = > <test_concurrent_ddls.TestConcurrentDdls object at > 0x7ff4623e7d50>.is_acceptable_error{noformat} > Checking the coordinator logs, it happens arround 22:57:13.984344 > {noformat} > I20250819 22:57:13.984344 1847783 jni-util.cc:321] > 524559e2ab1655af:14cdb06300000000] > org.apache.impala.common.AnalysisException: Table does not exist: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat} > Checking the catalogd logs, EventProcessor is paused at event id 38012 by a > global reset. It has fetched a batch of 22 events starting from event id > 38013. > {noformat} > I20250819 22:57:13.927796 1846530 MetastoreEventsProcessor.java:1160] > Received 22 events. First event id: 38013. > I20250819 22:57:13.930258 1848179 JniUtil.java:167] > ec43c8489011d054:946e984c00000000] resetMetadata request: INVALIDATE ALL > issued by ubuntu > ... > I20250819 22:57:13.931205 1848179 MetastoreEventsProcessor.java:973] > ec43c8489011d054:946e984c00000000] Event processing is paused. Last synced > event id is 38012{noformat} > After the reset finishes, EventProecssor is started at event id 38034. But it > continues to processed the fetched batch. This is something we should fix. > {noformat} > I20250819 22:57:13.954586 1848179 MetastoreEventsProcessor.java:1062] > ec43c8489011d054:946e984c00000000] Metastore event processing restarted. Last > synced event id was updated from 38012 to 38034{noformat} > While it processing event 38018, the log indicates the createEventId is -1: > {noformat} > I20250819 22:57:13.965526 1846530 CatalogOpExecutor.java:883] EventId: 38018 > Table > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 > was not added since it already exists in catalog. > ... > W20250819 22:57:13.966677 1846530 CatalogOpExecutor.java:886] Existing table > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11 > create event Id: -1 does not match the event id: 38018{noformat} > This leads to the table being dropped later by event 38027 which we should > skip: > {noformat} > I20250819 22:57:13.977993 1846530 Table.java:246] createEventId_ for table: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 > set to: 38027 > I20250819 22:57:13.978402 1846530 Table.java:261] lastSyncedEventId_ for > table: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2 > set from -1 to 38027 > I20250819 22:57:13.978508 1846530 MetastoreEvents.java:860] EventId: 38027 > EventType: ALTER_TABLE Target: > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11_2. > Removed table > test_local_catalog_ddls_with_invalidate_metadata_unlock_gap_cefb705f.test_11{noformat} > The table is added by reset() which sets createEventId to -1 for all tables. > We should at least use the latest event id before reset() to avoid such issue. -- This message was sent by Atlassian Jira (v8.20.10#820010)