[GitHub] [shardingsphere] TeslaCN opened a new issue, #21223: Implement sharding cache plugin

GitBox Tue, 27 Sep 2022 03:55:37 -0700


TeslaCN opened a new issue, #21223:
URL: https://github.com/apache/shardingsphere/issues/21223


   ## Cache ideas
   The routing calculation logic of the cache data sharding function reduces 
the CPU usage of ShardingSphere by changing space for time.
   
   ### cache premise
   
   #### The sharding algorithm ensures the consistency of input and output
   
   Consider defining a concept CacheableShardingAlgorithm, which is used to 
identify that algorithm results are cacheable. premise:
   The same input always yields the same output, no matter what the situation 
or time. For example, the built-in Mod and Range algorithms.
   
   For non-cacheable sharding algorithms such as:
   - Built-in INLINE algorithm. Since the algorithmic expression is highly 
free, the correctness of the cache cannot be determined. And for scenarios with 
demanding performance requirements, the INLINE algorithm is not recommended.
   - Algorithm output is affected by time. For example new Date() or 
LocalDate.now() etc. are used in the algorithm.
   - The input and output of the algorithm are not guaranteed to be 1:1. For 
example custom algorithm:
     - `Id == 0 -> ds_0`
     - `Id == 1 -> ds_1`
     - `Id == 2 -> ds_2, ds_3`
   - ...
   
   #### Parameter value does not generate by function
   
   E.g:
   - Based on datetime sharding, when a function such as `now()` is used on the 
shard key.
     Since traversing the SQL to check the `now()` function is not easy, 
combined with the current ShardingSphere only processes the `now()` function, 
if the sharding keys are all of the Number type, it means that there is no 
sharding key value generated by the `now()` function.
   
   #### The sharded table must hit a single point
   
   If the sharding table cannot hit a single point, the overhead at the network 
and database levels may be relatively larger. At this time, the performance 
loss of ShardingSphere will be relatively small, and the performance 
improvement effect of the cache calculation logic will be relatively small.
   
   ### DML cache conditions
   
   For broadcast table operations, INSERT/DELETE/UPDATE can be directly cached. 
SELECT statement routing is not cached due to the optimization of the kernel 
layer.
   
   #### INSERT
   
   - INSERT SELECT does not exist
   - does not exist on duplicate key update
   - No need to generate ID
     GeneratedKeyContext is empty or generated is false
   - Values has only one set of values
   - All values in Values can only be of the following types:
     - literal LiteralExpressionSegment
     - placeholder ParameterMarkerExpressionSegment
   
   #### DELETE / UPDATE
   
   - only one table in SQL
   - If it is a sharded table, the sharding algorithm must be cacheable
   - If it is a broadcast table, allow caching
   
   #### SELECT
   
   - Do not cache statements that query the broadcast table individually
   - Multi-table association must be binding table or broadcast table
   
   ### Cache cleaning
   
   #### passive cleanup
   - Make sure the cache can be freed automatically when memory is low.
   
   #### Active cleanup
   - Metadata changes, clear the cache;
   - Rule changes, clear the cache.
   
   
   ## API Design: a new ShardingCacheRule
   
   Sharding cache is not enabled by default.
   Configuring a ShardingCacheRule means enabling Sharding cache.
   
   ```yaml
   - !SHARDING_CACHE
     allowedMaxSqlLength: 100
     routeCache:
       initialCapacity: 256
       maximumSize: 4096
       softValues: true
   ```
   
   ## code design
   
   ### Enhance shardingsphere-sharding-core
   
   It is necessary to add a new method to ShardingConditionValue. During the 
process of extracting the sharding key value by the routing logic, the 
placeholder index corresponding to the sharding key value in SQL is also 
recorded.
   
   ```java
   public interface ShardingConditionValue {
       
       // Omitting origin methods...
       
       /**
        * Get parameter marker indexes.
        *
        * @return parameter marker indexes
        */
       List<Integer> getParameterMarkerIndexes();
   }
   ```
   
   The implementation class of this interface needs to add the implementation 
of the corresponding method, and add the corresponding extraction logic in the 
following methods:
   - WhereClauseShardingConditionEngine
   - InsertClauseShardingConditionEngine
   
   ### Add a new module shardingsphere-sharding-cache into 
shardingsphere-sharding-plugin
   
   ```
   shardingsphere-sharding-plugin
   ├── shardingsphere-sharding-cache
   ├── shardingsphere-sharding-cosid
   ├── shardingsphere-sharding-nanoid
   ```
   
   ### cache entry point
   
   
https://github.com/apache/shardingsphere/blob/5df14e707e290e482878f11786ef402024e317ed/shardingsphere-infra/shardingsphere-infra-route/src/main/java/org/apache/shardingsphere/infra/route/engine/impl/PartialSQLRouteExecutor.java#L54-L69
   
   In order not to be coupled with the original ShardingSQLRouter, a new 
implementation of CachedShardingSQLRouter is added, which is inserted in front 
of ShardingSQLRouter by way of OrderedSPI.
   
   Logic of CachedShardingSQLRouter:
   1. Check whether the statement satisfies the cache premise, if not, return 
an empty RouteContext;
   2. Obtain the parameter indexes corresponding to all shard keys of the 
statement, compare whether the actual parameters of the statement and shard 
keys exist in the cache, and if not, invoke ShardingSQLRouter to calculate the 
route;
   3. Check whether the routing result hits a single data node, and if it does 
not hit a single data node, it will not be cached;
   4. Put the RouteContext into the cache and return the deep copy object of 
the cached RouteContext.
   
   ### cache structure
   
   #### Cache 1: Check whether the SQL satisfies the cache condition + the 
subscript of the placeholder corresponding to the shard key in the parameter 
list
   
   Cache 1 corresponds to steps 1 and 2 of the CachedShardingSQLRouter logic.
   Key:
   - SQL
   
   Value:
   - boolean cacheable whether the statement is cacheable
   - `List<Integer>` The subscript of the placeholder corresponding to the 
shard key in the parameter list
   
   #### Cache 2: Route results corresponding to SQL and shard key parameters
   
   Cache 2 corresponds to steps 3 and 4 of the CachedShardingSQLRouter logic.
   
   Key:
   - SQL
   - `List<Object>` shard key actual parameter
   
   Value:
   - RouteContext
   
   ## Steps
   
   - [ ] Adjust part of the code in shardingsphere-sharding-core to support 
extracting shard key placeholder indexes
   - [ ] Implement route cache plugin + Yaml Rule configuration
   - [ ] Complete documents
   - [ ] Improve the surrounding: Spring API, DistSQL, etc.
   - [ ] Design the cache layer interface in the main process of ShardingSphere
   
   The current solution is implemented without modifying the main process of 
ShardingSphere, considering that it can be implemented quickly. Therefore, 
based on the existing main process, the sharding route caching is implemented 
by inserting a new Rule.
   
   In the future, it is necessary to consider adding a cache layer interface to 
the main process of ShardingSphere, so that the caching capability can be used 
as part of the main process of ShardingSphere, and design relevant cache 
implementations in different functions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [shardingsphere] TeslaCN opened a new issue, #21223: Implement sharding cache plugin

Reply via email to