morningman opened a new issue #5902:
URL: https://github.com/apache/incubator-doris/issues/5902


   ## Resource Division
   
   In the early version, Doris supports the multi-Cluster feature, which aims 
to manage multiple BE node groups into the same Doris cluster to facilitate 
unified management. And the cluster can achieve node-level resource isolation.
   
   But because of some design problems, this feature has been deprecated. These 
problems mainly include:
   
   1. The design problem of the code itself leads to tight coupling between 
various metadata, and the code is too heavy to maintain.
   
   2. Clusters are too independent, data are isolated from each other, and data 
migration costs are high.
   
   Therefore, we plan to implement a relatively lightweight node-level resource 
isolation to meet the following requirements:
   
   1. Reduce the maintenance of multiple clusters by users, and enable unified 
management in a set of clusters.
   2. Able to achieve node-level resource isolation, that is, different users 
can use different node groups, such as the isolation of online business and 
offline business, or the isolation of different departments.
   3. Able to support the storage of replicas of a tablet in different node 
groups, that is, users can use different node resources to query the same data. 
Isolate resources while sharing data.
   4. The code is loosely coupled and easy to maintain.
   
   ### Design
   
   The overall design ideas are as follows:
   
   1. Support setting Resource Tags for BE nodes, BE will group according to 
resource tags (resource groups).
   2. Support to specify the allocation of replicas on different BE nodes.
   3. When querying, user can access only the replica of the specified resource 
group according to the privilege. And only use the computing resources of the 
specified resource group.
   4. Support resource group permission verification.
   
   The detailed design plan is as follows:
   
   1. Set the resource tag
   
       User can specify a resource tag for the BE when adding it, or modify the 
tag at runtime. For simplicity, currently we only support each BE to specify a 
unique Tag. At the same time, in order to ensure compatibility with the 
original logic and unification of subsequent processing logic, each BE will 
have a default Tag unless it is specified.
       
       ```
       alter system add backend "1272:9050, 1212:9050" 
properties("tag.location": "zoneA");
       alter system modify backend "1272:9050, 1212:9050" set ("tag.location": 
"zoneB");
       ```
       
   2. Specify the allocation of replicas
   
       Here we discuss the following situations:
       
       1. Specify the copy distribution in the table creation statement
   
           We can specify the allocation of replicas in the properties of the 
table creation statement. For example, two of the three replicas are in 
resource group A, and one replica is in resource group B. This method requires 
the user to have access to the corresponding resource group. At the same time, 
the number of nodes in the corresponding resource group is required to be 
sufficient.
           
           ```
           CREATE TABLE example_db.table_hash
           (
           k1 TINYINT
           )
           DISTRIBUTED BY HASH(k1) BUCKETS 32
           PROPERTIES (
               "replication_allocation"="tag.location.zone1:1, 
tag.location.zone1:2"
           );
           ```
           
       2. Resource division at the database level
   
           It is more flexible to specify the replica allocation when creating 
a table, but the operation is more cumbersome. The user needs to specify a 
distribution every time the table is created. In some scenarios, the business 
is usually divided according to the database, and the resource distribution of 
the tables under a database remains consistent. Therefore, we can support 
specifying resource division at the database level, that is, for a db, specify 
its own resources. After that, the db tables all use this resource setting. The 
setting in the table creation statement can still overwrite the setting in db. 
In order to simplify the design, db-level resource division does not support 
specifying multiple tags, that is, when db-level resource division is used, all 
replicas are in a resource group.
           
           ```
           alter database db1 set ("replication_allocation" = 
"tag.location.zone1: 3");
           ```
           
       3. Division of resources at the partition level.
   
           In fact, the setting of the table replica allocation is at the 
partition level. That is, each partition saves its own replica allocation 
information. Therefore, although we specified resource division at the db and 
table levels, the information is ultimately stored at the partition level. 
Therefore, we can also support partition level's replica allocation 
modification. These modifications can occur in the add partition and modify 
partition operations.
           
       4. Modify allocation information
   
           In order to simplify the design, in this issue we only support the 
modification of allocation information at the partition level. In the 
follow-up, we are considering how to support the modification of the 
distribution information of the entire table or the entire database.
           
   3. Support specifying resources when querying
   
       This permission can be set in user property. The user can only query the 
replica of the data on the resource with the permission. And the query will 
only run in the corresponding resource group. A user can be granted permissions 
for multiple resource groups.
   
   4. Replica scheduling
   
       Replica scheduling includes replica complement and replica balance.
   
       1. Repair replicas
       
           Repair replicas include missing replicas, missing versions, node 
offline, redundant replicas, not in the corresponding resource group, 
Colocation distribution, etc.
   
       2. Replica balance
       
           The scope of replica balancing should be performed within a resource 
group. First perform load statistics on the nodes in a resource group, and then 
migrate data fragments from high-load nodes to low-load nodes.
   
   ## Schedule
   
   * Step1: Support setting tags for BE nodes, and support create table with 
specified tags
   * Step2: Support tablet repair and balance based on tags.
   * Step3: Support privilege checking of tag.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to