TeslaCN opened a new issue, #21223:
URL: https://github.com/apache/shardingsphere/issues/21223
## Cache ideas
The routing calculation logic of the cache data sharding function reduces
the CPU usage of ShardingSphere by changing space for time.
### cache premise
#### The sharding algorithm ensures the consistency of input and output
Consider defining a concept CacheableShardingAlgorithm, which is used to
identify that algorithm results are cacheable. premise:
The same input always yields the same output, no matter what the situation
or time. For example, the built-in Mod and Range algorithms.
For non-cacheable sharding algorithms such as:
- Built-in INLINE algorithm. Since the algorithmic expression is highly
free, the correctness of the cache cannot be determined. And for scenarios with
demanding performance requirements, the INLINE algorithm is not recommended.
- Algorithm output is affected by time. For example new Date() or
LocalDate.now() etc. are used in the algorithm.
- The input and output of the algorithm are not guaranteed to be 1:1. For
example custom algorithm:
- `Id == 0 -> ds_0`
- `Id == 1 -> ds_1`
- `Id == 2 -> ds_2, ds_3`
- ...
#### Parameter value does not generate by function
E.g:
- Based on datetime sharding, when a function such as `now()` is used on the
shard key.
Since traversing the SQL to check the `now()` function is not easy,
combined with the current ShardingSphere only processes the `now()` function,
if the sharding keys are all of the Number type, it means that there is no
sharding key value generated by the `now()` function.
#### The sharded table must hit a single point
If the sharding table cannot hit a single point, the overhead at the network
and database levels may be relatively larger. At this time, the performance
loss of ShardingSphere will be relatively small, and the performance
improvement effect of the cache calculation logic will be relatively small.
### DML cache conditions
For broadcast table operations, INSERT/DELETE/UPDATE can be directly cached.
SELECT statement routing is not cached due to the optimization of the kernel
layer.
#### INSERT
- INSERT SELECT does not exist
- does not exist on duplicate key update
- No need to generate ID
GeneratedKeyContext is empty or generated is false
- Values has only one set of values
- All values in Values can only be of the following types:
- literal LiteralExpressionSegment
- placeholder ParameterMarkerExpressionSegment
#### DELETE / UPDATE
- only one table in SQL
- If it is a sharded table, the sharding algorithm must be cacheable
- If it is a broadcast table, allow caching
#### SELECT
- Do not cache statements that query the broadcast table individually
- Multi-table association must be binding table or broadcast table
### Cache cleaning
#### passive cleanup
- Make sure the cache can be freed automatically when memory is low.
#### Active cleanup
- Metadata changes, clear the cache;
- Rule changes, clear the cache.
## API Design: a new ShardingCacheRule
Sharding cache is not enabled by default.
Configuring a ShardingCacheRule means enabling Sharding cache.
```yaml
- !SHARDING_CACHE
allowedMaxSqlLength: 100
routeCache:
initialCapacity: 256
maximumSize: 4096
softValues: true
```
## code design
### Enhance shardingsphere-sharding-core
It is necessary to add a new method to ShardingConditionValue. During the
process of extracting the sharding key value by the routing logic, the
placeholder index corresponding to the sharding key value in SQL is also
recorded.
```java
public interface ShardingConditionValue {
// Omitting origin methods...
/**
* Get parameter marker indexes.
*
* @return parameter marker indexes
*/
List<Integer> getParameterMarkerIndexes();
}
```
The implementation class of this interface needs to add the implementation
of the corresponding method, and add the corresponding extraction logic in the
following methods:
- WhereClauseShardingConditionEngine
- InsertClauseShardingConditionEngine
### Add a new module shardingsphere-sharding-cache into
shardingsphere-sharding-plugin
```
shardingsphere-sharding-plugin
├── shardingsphere-sharding-cache
├── shardingsphere-sharding-cosid
├── shardingsphere-sharding-nanoid
```
### cache entry point
https://github.com/apache/shardingsphere/blob/5df14e707e290e482878f11786ef402024e317ed/shardingsphere-infra/shardingsphere-infra-route/src/main/java/org/apache/shardingsphere/infra/route/engine/impl/PartialSQLRouteExecutor.java#L54-L69
In order not to be coupled with the original ShardingSQLRouter, a new
implementation of CachedShardingSQLRouter is added, which is inserted in front
of ShardingSQLRouter by way of OrderedSPI.
Logic of CachedShardingSQLRouter:
1. Check whether the statement satisfies the cache premise, if not, return
an empty RouteContext;
2. Obtain the parameter indexes corresponding to all shard keys of the
statement, compare whether the actual parameters of the statement and shard
keys exist in the cache, and if not, invoke ShardingSQLRouter to calculate the
route;
3. Check whether the routing result hits a single data node, and if it does
not hit a single data node, it will not be cached;
4. Put the RouteContext into the cache and return the deep copy object of
the cached RouteContext.
### cache structure
#### Cache 1: Check whether the SQL satisfies the cache condition + the
subscript of the placeholder corresponding to the shard key in the parameter
list
Cache 1 corresponds to steps 1 and 2 of the CachedShardingSQLRouter logic.
Key:
- SQL
Value:
- boolean cacheable whether the statement is cacheable
- `List<Integer>` The subscript of the placeholder corresponding to the
shard key in the parameter list
#### Cache 2: Route results corresponding to SQL and shard key parameters
Cache 2 corresponds to steps 3 and 4 of the CachedShardingSQLRouter logic.
Key:
- SQL
- `List<Object>` shard key actual parameter
Value:
- RouteContext
## Steps
- [ ] Adjust part of the code in shardingsphere-sharding-core to support
extracting shard key placeholder indexes
- [ ] Implement route cache plugin + Yaml Rule configuration
- [ ] Complete documents
- [ ] Improve the surrounding: Spring API, DistSQL, etc.
- [ ] Design the cache layer interface in the main process of ShardingSphere
The current solution is implemented without modifying the main process of
ShardingSphere, considering that it can be implemented quickly. Therefore,
based on the existing main process, the sharding route caching is implemented
by inserting a new Rule.
In the future, it is necessary to consider adding a cache layer interface to
the main process of ShardingSphere, so that the caching capability can be used
as part of the main process of ShardingSphere, and design relevant cache
implementations in different functions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
[email protected]
For queries about this service, please contact Infrastructure at:
[email protected]