kimmking opened a new issue #4896: [DISCUSS]MetadataCenter Design 5.x
URL: https://github.com/apache/incubator-shardingsphere/issues/4896
 
 
   MetadataCenter Design5.x is a part of Orchestration 5.x Design(#4515).
   
   ## Metadata Center Design
   
   The purpose of this document is to illustrate the metadata center design of 
the Apache ShardingSphere governance module.
   
   [TOC]
   
   ### 1. Definition
   
   The word 'metadata' in this article is the metadata of the data source used 
by Sharding-JDBC / Sharding-Proxy. These metadata are the core data objects to 
ensure the correct operations of each component of ShardingSphere. Currently, 
they are scattered in each part of the system. They should be reorganized and 
managed in a unified manner using a new component such like 'metadata center', 
and coordinate changes when metadatas changes.
   
   
   
   ### 2. Types
   
   The current metadata object models are mainly defined in:
   
   > org.apache.shardingsphere.sql.parser.binder
   >
   > ├─column
   > │      ColumnMetaData.java
   > │      ColumnMetaDataLoader.java
   > ├─index
   > │      IndexMetaData.java
   > │      IndexMetaDataLoader.java
   > ├─schema
   > │      SchemaMetaData.java
   > │      SchemaMetaDataLoader.java
   > └─table
   >      TableMetaData.java
   >      TableMetaDataLoader.java
   >
   > 
   >
   > Hierarchical relationship is `schema > table > column + index`
   >
   > At the same time, the scaling module also has a subset of models and 
loaders that need to be merged (#4866).
   
   
   
   One issue to further discuss is:
   
   > Question 1: Should sharding rules and so on be put in the metadata center 
uniformly?
   >
   > It is recommended to only process datasource metadatas right now, and rule 
datas should still be placed in config center. See if we want to adjust them 
later.
   
   
   
   ### 3. Current Metadata Loading
   
   The unified entry for loading metadatas in:
   
   > 
org.apache.shardingsphere.sql.parser.binder.metadata.schema.SchemaMetaDataLoader
   
   
![image-20200319190724018](https://user-images.githubusercontent.com/807508/77133013-76df2780-6a9c-11ea-9a98-54d69bd43599.png)
   
   Metadatas are loaded in three places:
   
   1、Sharding-JDBC
   
   2、Sharding-Proxy
   
   > 
Bootstrap.startWithRegistryCenter->LogicSchemas.init/initSchemas->LogicSchemas.initSchemas(for)
   >
   > 
->LogicSchemaFactory.newInstance->XXSchema->XXSchema.createMetaData/loadSchemaMetaData
   >
   > ->SchemaMetaDataLoader.load(dataSource, maxConnectionsSizePerQuery)
   
   Each XXSchema class implements guava's @ Subscribe's renew method, which can 
execute the corresponding rules to refresh them(rules, not metadatas) when an 
event is received. 
   
   ShardingSchema and MasterSlaveSchema can additionally support the 
disableEvent to disable the data source. (This section have a lot codes those 
can be optimized.)
   
   sharding table metadatas loading:
   
   > org.apache.shardingsphere.core.metadata.ShardingMetaDataLoader
   
   Load the logic part and the default part separately, and then call 
SchemaMetaDataLoader to load according to the hierarchy.
   
   3、Sharding-scaling
   
   There are an independent TableMetaDataLoader/ColumnMetaDataLoader here, used 
to load its own TableMetaData/ColumnMetaData.
   
   
   
   ### 4. Metadata Usage 
   
   **Route Module**
   
     * If it is SelectStatementContext, where condition exists, it will 
participate in obtaining ShadingConditions
     * If it is DDLStatement, DCLStatement, it will get the table data from it.
   
   **Rewrite Module**
   
   - Determine if the column exists in metadata.
     - EncryptPredicateParameterRewriter
     - EncryptPredicateColumnTokenGenerator
     - EncryptPredicateRightValueTokenGenerator
   
   **Execute Module**
   
     * If it is the following types, it will refresh metaData.
   
    ```java
    if (sqlStatementContext instanceof CreateTableStatementContext) {
               refreshTableMetaData(runtimeContext, 
((CreateTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof 
AlterTableStatementContext) {
               refreshTableMetaData(runtimeContext, 
((AlterTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof DropTableStatementContext) 
{
               refreshTableMetaData(runtimeContext, 
((DropTableStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof 
CreateIndexStatementContext) {
               refreshTableMetaData(runtimeContext, 
((CreateIndexStatementContext) sqlStatementContext).getSqlStatement());
           } else if (sqlStatementContext instanceof DropIndexStatementContext) 
{
               refreshTableMetaData(runtimeContext, 
((DropIndexStatementContext) sqlStatementContext).getSqlStatement());
           }
    ```
   
   **Merge Module**
   
     * Not used, use SQL directly to return MetaData from ResultSet
   
   
   
   ### 5. Metadata Changes
   
   Currently, metadatas are loaded and managed by each started sharding-JDBC or 
proxy node.
   
   If a DDL is executed through one node, the following refresh method is 
directly called to refresh the metadata of the current node.
   
   JDBC:
   
   > 
org.apache.shardingsphere.shardingjdbc.executor.AbstractStatementExecutor.refreshMetaDataIfNeeded
   
   Proxy:
   
   > 
org.apache.shardingsphere.shardingproxy.backend.schema.impl.ShardingSchema.refreshTableMetaData
   
   A tip: The codes in these two places are also heavily duplicated.
   
   
   
   ### 6. Metadata Center Design
   
   From the above sections, we know that there are some points that need 
improvement:
   
   Improvement 1: 
   
   If there are multiple nodes started at the same time, a large number of 
metadatas will be repeatedly loaded from the DB, and maybe lead to a 
performance issue.
   
   Improvement 2:
   
   Once a node executed a DDL, other nodes don't know that the metadatas had 
changed, and lead to a consistency issue.
   
   We hope to solve these two problems via metadata center design.
   
   #### 6.1 Definition
   
   The metadata center is a mechanism for unified loading of metadatas, change 
notifications, and data synchronization through the unified management of all 
metadatas.
   
   #### 6.2 Feature
   
   It is planned to sort out the existing metadata loading and all usage 
scenarios, uniformly manage the metadata to be persisted to the 
CenterRepository since the metadatas are loaded for the first time, and 
subsequent nodes will start to obtain metadatas from the metadata center 
(Improvement 1). When a node performs a DDL operation, the node's metadatas is 
refreshed and synchronized to CenterRepository, and then all other nodes are 
notified to synchronize new datas from the metadata center (Improvement 2).
   
   #### 6.3 API
   
   On the basis of clearing up, the existing metadata loading logic will be 
restructured, and some loader code will be migrated to the new metadata center 
module:
   
   > sharding-orchestration-center-metadata
   
   And the following new API::
   
   1. MetadataCenter
   2. MetadataLoader
   3. MetadataNode
   4. MetadataListener
   5. MetadataChangeEvent
   
   
   
   #### 6.4 Data Structure
   
   The current metadata memory structure is as follows:
   
   
![image-20200319194148562](https://user-images.githubusercontent.com/807508/77133014-78105480-6a9c-11ea-972d-e4dad9f7bf08.png)
   
   
   There are two style available metadata structure configurations.
   
   CenterRepository metadata structure configuration style one:
   
   > ```
   > ├─orchestration-namespace
   > ├─orchestration-name
   > │  ├─metadata
   > │  │  ├─ip1:port/catalog/schema
   > │  │  │  ├─table1
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  │  ├─table2
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  ├─ip2:port/catalog/schema
   > │  │  │  ├─table3
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > │  │  │  ├─table4
   > │  │  │  │  ├─columns
   > │  │  │  │  └─indexs
   > ```
   
   CenterRepository metadata structure configuration style two:
   
   > ```
   > ├─orchestration-namespace
   > ├─orchestration-name
   > │  ├─metadata
   > │  │  ├─ip1:port/catalog/schema
   > │  │  │  ├─ [json/yaml text contents]
   > ```
   
   Style one: It is intuitive and fine-grained.
   
   Style two: It is simpler and easier to manage. 
   
   (It is recommended to use style 2.)
   
   
   
   One question:
   
   > Question 2: Will the logic table be displayed here?
   >
   > I think the answer is no.
   >
   > Only the actual real tables metadatas here, logic table is not a part of 
metadata.
   
   
   
   #### 6.5 Loading
   
   Current loading process:
   
   > Load sharding tables first, then load default tables, and check whether 
the metadatas of all tables is consistent according to the 
check.metadata.enable parameter.
   
   New loading process:
   
   > After loading metadata, write to CenterRepository, then trigger global 
notification.
   
   
   
   > Question 3: Are the metadatas of this step written to CenterRepository 
after each group or datasource is loaded, or after all metadatas loaded and 
written them once (related to triggering one or more events).
   >
   > Consider to implement a full synchronizing at first, and then see if it 
can be persisted and notified in batches, based on loading speed.
   
   #### 6.6 Synchronization
   
   After DDL is executed in one node, the metadatas of this node is refreshed 
through the Event mechanism(instead of local method calling),  synchronized to 
CenterRepository, and then a global change notification is triggered to other 
nodes.
   
   > Question 4: Do we need to hold the loading process of other nodes to 
prevent concurrency.
   >
   > Do not consider concurrency at first, implement the load and notify 
function, and finally solve this problem, maybe involving distributed locks.
   
   
![幻灯片4](https://user-images.githubusercontent.com/807508/77132931-2ff13200-6a9c-11ea-82a4-dbb10195b1c8.PNG)
   
   
   #### 6.7 Notification
   
   Through the Event mechanism, other nodes are notified and then update 
metadatas from CenterRepository.
   
   
![幻灯片5](https://user-images.githubusercontent.com/807508/77132933-31225f00-6a9c-11ea-9e97-12593880da3e.PNG)
   
   ### 7. Task list
   
   - [ ] 7.1 metadata carding
   - [ ] 7.2 metadata refactoring (code abstraction and cleanup)
   - [ ] 7.3 Add center-metadata related project structure
   - [ ] 7.4 Implementing metadata persistence
   - [ ] 7.5 Migrating some loader codes
   - [ ] 7.6 Implementing the event notification mechanism
   - [ ] 7.7 Implementing Global Synchronization
   - [ ] 7.8 Optimize loading and usage
   - [ ] 7.9 Improve Unittests
   - [ ] 7.10 Implementing examples
   - [ ] 7.11 Implementing documents
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to