Kadir Ozdemir created PHOENIX-6883:
--------------------------------------

             Summary: Phoenix metadata caching redesign
                 Key: PHOENIX-6883
                 URL: https://issues.apache.org/jira/browse/PHOENIX-6883
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


PHOENIX-6761 improves the client side metadata caching by eliminating the 
separate cache for each connection. This improvement results in memory and 
compute savings since it eliminates copying CQSI level cache every time a 
Phoenix connection is created, and also replaces the inefficient the CQSI level 
cache implementation with Guava Cache from Google. 

Despite this improvement, the overall metadata caching architecture begs for 
redesign. This is because every operation in Phoenix need to make multiple RPCs 
to metadata servers for the SYSTEM.CATALOG table (please see PHOENIX-6860) to 
ensure the latest metadata changes are visible to clients. These constant RPCs 
makes the region servers serving SYSTEM.CATALOG hot spot and thus leads to poor 
performance and availability issues.

The UPDATE_CACHE_FREQUENCY configuration parameter specifies how frequently the 
client cache is updated. However, setting this parameter to a non-zero value 
results in stale caching. Stale caching can cause data integrity issues. For 
example, if an index table creation is not visible to the client, Phoenix would 
skip updating the index table in the write path. That's why is this parameter 
is typically set to zero. However, this defeats the purpose of client side 
metadata caching.

The redesign of the metadata caching architecture is to directly address this 
issue by making sure that the client metadata caching is always used (that is, 
UPDATE_CACHE_FREQUENCY is set to NEVER) but still ensures the data integrity. 
This is achieved by three main changes. 

The first change is to introduce server side metadata caching in all region 
servers. Currently, the server side metadata caching is used on the region 
servers serving SYSTEM.CATALOG. This metadata caching should be strongly 
consistent such that the metadata updates should include invalidating the 
corresponding entries on the server side caches. This would ensure the server 
cache would not become stale.

The second change is that the Phoenix client passes the LAST_DDL_TIMESTAMP 
table attribute along with scan and mutation operations to the server regions 
(more accurately to the Phoenix coprocessors). Then the Phoenix coprocessors 
would check the timestamp on a given operation against with the timestamp in 
its server side cache to validate that the client did not use stale metadata 
when it prepared the operation. If the client did use stale metadata then the 
coprocessor would return an exception (this exception can be called 
StaleClientMetadataCacheException) to the client.

The third change is that upon receiving StaleClientMetadataCacheException the 
Phoenix client makes an RPC call to the metadata server to update the client 
cache, reconstruct the operation with the updated cached, and retry the 
operation.

This redesign would require updating client and server metadata caches only 
when metadata is stale instead of updating the client metadata cache for each 
(scan or mutation) operation. This would eliminate hot spotting on the metadata 
servers and thus poor performance and availability issues caused by this hot 
spotting.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to