[ 
https://issues.apache.org/jira/browse/IMPALA-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011980#comment-18011980
 ] 

ASF subversion and git services commented on IMPALA-14283:
----------------------------------------------------------

Commit aec7380b7531096cf4ec178b1ea0d9c43f02f531 in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=aec7380b7 ]

IMPALA-14283: Invalidate the cache when served by a new catalogd

Before this patch, coordinator just invalidates the catalog cache when
witness the catalog service id changes in DDL/DML responses or
statestore catalog updates. This is enough in the legacy catalog mode
since these are the only ways that coordinator gets metadata from
catalogd. However, in local catalog mode, coordinator sends
getPartialCatalogObject requests to fetch metadata from catalogd. If the
request is now served by a new catalogd (e.g. due to HA failover),
coordinator should invalidate its catalog cache in case catalog version
overlaps on the same table and unintentionally reuse stale metadata.

To ensure performance, catalogServiceIdLock_ in CatalogdMetaProvider is
refactored to be a ReentrantReadWriteLock. Most of the usages on it just
need the read lock.

This patch also adds the catalog service id in the profile.

Tests:
 - Ran test_warmed_up_metadata_failover_catchup 50 times.
 - Ran FE tests: CatalogdMetaProviderTest and LocalCatalogTest.
 - Ran CORE tests

Change-Id: I751e43f5d594497a521313579defc5b179dc06ce
Reviewed-on: http://gerrit.cloudera.org:8080/23236
Reviewed-by: Riza Suminto <[email protected]>
Tested-by: Quanlong Huang <[email protected]>


> Coordinator could use stale metadata in catalogd warm failover
> --------------------------------------------------------------
>
>                 Key: IMPALA-14283
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14283
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>         Attachments: catalogd.INFO, catalogd_node1.INFO, impalad.INFO, 
> show_parts_profile.txt, statestored.INFO
>
>
> There is a corner case in local catalog mode that when catalogd warm failover 
> finishes, coordinator fetches the new table object from the new active 
> catalogd and it might coincidently "activate" the stale metadata in the 
> cache. This happens when matching all the following conditions:
>  * The new table object has a catalog version that has been used in the 
> previous active catalogd and it was assigned to the same table name.
>  * The coordinator has loaded table metadata associated to that version from 
> the previous active catalogd, e.g. partition list, partitions or column stats 
> are loaded and still in the cache (not aged out or evicted yet).
>  * The initial catalog update of the new active catalogd hasn't arrived yet. 
> There is some delay when statestore broadcast the updates. If the statestore 
> update arrives before the query planning, we are safe since processing the 
> update will clear the cache.
> This bug is found by *TestCatalogdHA.test_warmed_up_metadata_failover_catchup*
> {code:python}
> tests/custom_cluster/test_catalogd_ha.py:640: in 
> test_warmed_up_metadata_failover_catchup                                      
>                                                                               
>          
>     db, self._add_new_partition, self._verify_new_partition)                  
>                                                                               
>                                                           
> tests/custom_cluster/test_catalogd_ha.py:762: in 
> _test_metadata_after_failover                                                 
>                                                                               
>          
>     assert str(active_catalogd.service.service_port) in str(e)                
>                                                                               
>                                                           
> E   assert '26001' in "assert 'year=2025/month=1' in 
> 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'"                                 
>                                                                               
>      
> E    +  where '26001' = str(26001)                                            
>                                                                               
>                                                           
> E    +    where 26001 = <tests.common.impala_service.CatalogdService object 
> at 0x7f5d8c24fd10>.service_port                                               
>                                                             
> E    +      where <tests.common.impala_service.CatalogdService object at 
> 0x7f5d8c24fd10> = <CatalogdProcess PID: None 
> (/root/Impala-upstream/be/build/latest/service/cata...te_store_subscriber_port=23021
>  -webserver_
> port=25021 -enable_catalogd_ha=true)>.service                                 
>                                                                               
>                                                           
> E    +  and   "assert 'year=2025/month=1' in 
> 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'" = str(AssertionError(u"assert 
> 'year=2025/month=1' in 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'",)){code}
> The test runs an AddPartition statement and kills the active catalogd, then 
> runs ShowPartitions in the new active catalogd. It fails since coordinator 
> returns an empty partition list. Test logs:
> {noformat}
> -- 2025-07-31 17:10:00,755 INFO     MainThread: hs2: executing against Impala 
> at quanlong-ha-test.vpc.cloudera.com:21050. session: 
> ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None                
>      
>                                                                               
>                                                                               
>                                                           
> alter table warmup_test_db.tbl add partition(year=2025, month=1);             
>                                                                               
>                                                           
>                                                                               
>                                                                               
>                                                           
> -- 2025-07-31 17:10:01,119 INFO     MainThread: 
> b84a8a27014a3340:af77289200000000: query started                              
>                                                                               
>           
> -- 2025-07-31 17:10:01,120 INFO     MainThread: 
> b84a8a27014a3340:af77289200000000: getting log for operation                  
>                                                                               
>           
> -- 2025-07-31 17:10:01,120 INFO     MainThread: 
> b84a8a27014a3340:af77289200000000: getting runtime profile operation          
>                                                                               
>           
> -- 2025-07-31 17:10:01,120 INFO     MainThread: 
> b84a8a27014a3340:af77289200000000: closing query for operation                
>                                                                               
>           
> -- 2025-07-31 17:10:01,147 INFO     MainThread: Found PID 102378 for 
> /root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 -v=1 
> -max_log_files=10 -log_rotation_match_pid=true -log_filename=catalogd_n
> ode1 -log_dir=/tmp/ -kudu_master_hosts localhost --catalog_topic_mode=minimal 
> --catalogd_ha_reset_metadata_on_failover=false 
> --debug_actions=catalogd_event_processing_delay:SLEEP@1000 
> --enable_reload_events=true --
> warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
>  -catalog_service_port=26001 -state_store_subscriber_port=23021 
> -webserver_port=25021 -enable_catalogd_ha=true             
> -- 2025-07-31 17:10:01,172 INFO     MainThread: Killing <CatalogdProcess PID: 
> 102378 (/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 
> -v=1 -max_log_files=10 -log_rotation_match_pid=true -log_fi
> lename=catalogd_node1 -log_dir=/tmp/ -kudu_master_hosts localhost 
> --catalog_topic_mode=minimal --catalogd_ha_reset_metadata_on_failover=false 
> --debug_actions=catalogd_event_processing_delay:SLEEP@1000 --enable_relo
> ad_events=true 
> --warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
>  -catalog_service_port=26001 -state_store_subscriber_port=23021 
> -webserver_port=25021 -enable_catalogd_ha=
> true)> with signal 9                                                          
>                                                                               
>                                                           
> -- 2025-07-31 17:10:01,177 INFO     MainThread: Getting metric: 
> catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020     
>              
> -- 2025-07-31 17:10:01,180 INFO     MainThread: Waiting for metric value 
> 'catalog-server.active-status'=True. Current value: False. total_wait: 0s
> -- 2025-07-31 17:10:01,180 INFO     MainThread: Sleeping 1s before next 
> retry.                                                                        
>                                                                 
> -- 2025-07-31 17:10:02,181 INFO     MainThread: Getting metric: 
> catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020     
>                                                                         
> -- 2025-07-31 17:10:02,184 INFO     MainThread: Metric 
> 'catalog-server.active-status' has reached desired value: True. total_wait: 
> 1.00396800041s
> -- 2025-07-31 17:10:02,188 INFO     MainThread: hs2: executing against Impala 
> at quanlong-ha-test.vpc.cloudera.com:21050. session: 
> ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
>                                                                               
>                                                                               
>                                                           
> show partitions warmup_test_db.tbl;
>                                                                               
>                                                                               
>                                                           
> -- 2025-07-31 17:10:03,737 INFO     MainThread: 
> 664e4d97f4712dfb:e4d0877500000000: query started           
> -- 2025-07-31 17:10:03,738 INFO     MainThread: 
> 664e4d97f4712dfb:e4d0877500000000: getting log for operation
> -- 2025-07-31 17:10:03,738 INFO     MainThread: 
> 664e4d97f4712dfb:e4d0877500000000: getting runtime profile operation   
> -- 2025-07-31 17:10:03,738 INFO     MainThread: 
> 664e4d97f4712dfb:e4d0877500000000: closing query for operation      
> -- 2025-07-31 17:10:03,739 INFO     MainThread: partition result: 
> ['Total\t\t-1\t0\t0B\t0B\t\t\t\t\t']{noformat}
> Profile of ShowPartitions indicates some stale metadata are used:
> {noformat}
>       Original Table Versions: warmup_test_db.tbl, 314, 1754007002807, Thu 
> Jul 31 17:10:02 PDT 2025
>        - CatalogFetch.ColumnStats.Hits: 13  <--- This shouldn't hit
>        - CatalogFetch.ColumnStats.Misses: 0
>        - CatalogFetch.ColumnStats.Requests: 13
>        - CatalogFetch.ColumnStats.Time: 0
>        - CatalogFetch.DatabaseList.Hits: 1
>        - CatalogFetch.DatabaseList.Misses: 0
>        - CatalogFetch.DatabaseList.Requests: 1
>        - CatalogFetch.DatabaseList.Time: 0
>        - CatalogFetch.PartitionLists.Hits: 1  <--- This shouldn't hit
>        - CatalogFetch.PartitionLists.Misses: 0
>        - CatalogFetch.PartitionLists.Requests: 1
>        - CatalogFetch.PartitionLists.Time: 0
>        - CatalogFetch.RPCs.Bytes: 1.78 KB (1818)
>        - CatalogFetch.RPCs.Requests: 2
>        - CatalogFetch.RPCs.Time: 1s531ms
>        - CatalogFetch.StorageLoad.Time: 22ms
>        - CatalogFetch.TableList.Hits: 1
>        - CatalogFetch.TableList.Misses: 1
>        - CatalogFetch.TableList.Requests: 2
>        - CatalogFetch.TableList.Time: 1s516ms
>        - CatalogFetch.Tables.Hits: 0
>        - CatalogFetch.Tables.Misses: 1
>        - CatalogFetch.Tables.Requests: 1
>        - CatalogFetch.Tables.Time: 15ms
>        - ExecutorGroupsConsidered: 1 (1){noformat}
> In logs of the new active catalogd (catalogd.INFO), the catalog version of 
> the table is 314:
> {noformat}
> I20250731 17:10:02.760269 103771 CatalogOpExecutor.java:4872] Found 1/1 
> partitions to add in table warmup_test_db.tbl from event 888
> I20250731 17:10:02.807034 103771 HdfsTable.java:781] Loaded file and block 
> metadata for warmup_test_db.tbl partitions: year=2025/month=1. Time taken: 
> 27.856ms
> I20250731 17:10:02.807621 103771 MetastoreEvents.java:860] EventId: 888 
> EventType: ADD_PARTITION Target: warmup_test_db.tbl. Successfully added 1 
> partitions to table warmup_test_db.tbl
> I20250731 17:10:02.807871 103771 MetastoreEventsProcessor.java:1578] Time 
> elapsed in processing event batch: 1s156ms
> I20250731 17:10:03.673661 103802 JniUtil.java:167] getEventProcessorSummary 
> request: Getting event processor summary
> I20250731 17:10:03.674260 103802 JniUtil.java:178] Finished 
> getEventProcessorSummary request: Getting event processor summary. Time 
> spent: 1ms
> I20250731 17:10:03.674295 103802 catalog-server.cc:978] EventProcessor is 
> synced up with HMS event id during failover: 888
> I20250731 17:10:03.674309 103802 catalog-server.cc:887] This catalogd 
> instance is changed to active status
> I20250731 17:10:03.677537 103773 JniUtil.java:167] getCatalogDelta request: 
> Getting catalog delta from version 0
> I20250731 17:10:03.687700 103773 catalog-server.cc:1615] Collected update: 
> 2:DATABASE:default, version=2, original size=48, compressed size=49
> ...
> I20250731 17:10:03.720110 103773 CatalogServiceCatalog.java:1965] Skipped 0 
> partitions of table warmup_test_db.tbl in the incremental update
> I20250731 17:10:03.720285 103773 CatalogServiceCatalog.java:1978] Collected 1 
> partition update(s): 2:HDFS_PARTITION:warmup_test_db.tbl:year=2025/month=1, 
> version=314, original size=100, compressed size=101
> I20250731 17:10:03.720324 103773 catalog-server.cc:1615] Collected update: 
> 2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed 
> size=66{noformat}
> In the previous catalogd logs (catalogd_node1.INFO), the table version before 
> AddPartition is also 314!
> {noformat}
> I20250731 17:09:58.553951 103647 HdfsTable.java:1256] Loading metadata for 
> table definition and all partition(s) of warmup_test_db.tbl (needed by 
> coordinator)
> I20250731 17:09:58.600651 103647 HdfsTable.java:1919] Loaded 13 columns from 
> HMS. Actual columns: 13
> I20250731 17:09:58.623989 103647 HdfsTable.java:2987] Load Valid Write Id 
> List Done. Time taken: 24.253us
> I20250731 17:09:58.624092 103647 HdfsTable.java:1303] Fetching partition 
> metadata from the Metastore: warmup_test_db.tbl
> I20250731 17:09:58.628580 103647 MetaStoreUtil.java:191] Fetching 0 
> partitions for: warmup_test_db.tbl using partition batch size: 1000
> I20250731 17:09:58.628685 103647 HdfsTable.java:1310] Fetched partition 
> metadata from the Metastore: warmup_test_db.tbl
> I20250731 17:09:58.649619 103647 HdfsTable.java:781] Loaded file and block 
> metadata for warmup_test_db.tbl partitions: . Time taken: 3.319ms
> I20250731 17:09:58.651077 103647 Table.java:1106] last refreshed event id for 
> table: warmup_test_db.tbl set to: -1
> I20250731 17:09:58.651185 103647 TableLoader.java:187] Loaded metadata for: 
> warmup_test_db.tbl (129ms)
> I20250731 17:09:58.685761 102954 JniUtil.java:167] getCatalogDelta request: 
> Getting catalog delta from version 313
> I20250731 17:09:58.693717 102954 CatalogServiceCatalog.java:1965] Skipped 0 
> partitions of table warmup_test_db.tbl in the incremental update
> I20250731 17:09:58.693792 102954 catalog-server.cc:1615] Collected update: 
> 2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed 
> size=66{noformat}
> So coordinator has loaded the empty partition list of it in cache. It's 
> associated with table name "warmup_test_db.tbl" and catalog version 314. Then 
> coincidently reused after catalogd warm failover.
> We should add the catalog service id in the getPartialCatalogObject response 
> and coordinator should clear the cache when the id changes, just like what we 
> do in updateCatalogCache() -> 
> [witnessCatalogServiceId()|https://github.com/apache/impala/blob/5414c3009890a736901b6ae3b56b9953b1992de7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L1614].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to