morningman opened a new issue #5902:
URL: https://github.com/apache/incubator-doris/issues/5902
## Resource Division
In the early version, Doris supports the multi-Cluster feature, which aims
to manage multiple BE node groups into the same Doris cluster to facilitate
unified management. And the cluster can achieve node-level resource isolation.
But because of some design problems, this feature has been deprecated. These
problems mainly include:
1. The design problem of the code itself leads to tight coupling between
various metadata, and the code is too heavy to maintain.
2. Clusters are too independent, data are isolated from each other, and data
migration costs are high.
Therefore, we plan to implement a relatively lightweight node-level resource
isolation to meet the following requirements:
1. Reduce the maintenance of multiple clusters by users, and enable unified
management in a set of clusters.
2. Able to achieve node-level resource isolation, that is, different users
can use different node groups, such as the isolation of online business and
offline business, or the isolation of different departments.
3. Able to support the storage of replicas of a tablet in different node
groups, that is, users can use different node resources to query the same data.
Isolate resources while sharing data.
4. The code is loosely coupled and easy to maintain.
### Design
The overall design ideas are as follows:
1. Support setting Resource Tags for BE nodes, BE will group according to
resource tags (resource groups).
2. Support to specify the allocation of replicas on different BE nodes.
3. When querying, user can access only the replica of the specified resource
group according to the privilege. And only use the computing resources of the
specified resource group.
4. Support resource group permission verification.
The detailed design plan is as follows:
1. Set the resource tag
User can specify a resource tag for the BE when adding it, or modify the
tag at runtime. For simplicity, currently we only support each BE to specify a
unique Tag. At the same time, in order to ensure compatibility with the
original logic and unification of subsequent processing logic, each BE will
have a default Tag unless it is specified.
```
alter system add backend "1272:9050, 1212:9050"
properties("tag.location": "zoneA");
alter system modify backend "1272:9050, 1212:9050" set ("tag.location":
"zoneB");
```
2. Specify the allocation of replicas
Here we discuss the following situations:
1. Specify the copy distribution in the table creation statement
We can specify the allocation of replicas in the properties of the
table creation statement. For example, two of the three replicas are in
resource group A, and one replica is in resource group B. This method requires
the user to have access to the corresponding resource group. At the same time,
the number of nodes in the corresponding resource group is required to be
sufficient.
```
CREATE TABLE example_db.table_hash
(
k1 TINYINT
)
DISTRIBUTED BY HASH(k1) BUCKETS 32
PROPERTIES (
"replication_allocation"="tag.location.zone1:1,
tag.location.zone1:2"
);
```
2. Resource division at the database level
It is more flexible to specify the replica allocation when creating
a table, but the operation is more cumbersome. The user needs to specify a
distribution every time the table is created. In some scenarios, the business
is usually divided according to the database, and the resource distribution of
the tables under a database remains consistent. Therefore, we can support
specifying resource division at the database level, that is, for a db, specify
its own resources. After that, the db tables all use this resource setting. The
setting in the table creation statement can still overwrite the setting in db.
In order to simplify the design, db-level resource division does not support
specifying multiple tags, that is, when db-level resource division is used, all
replicas are in a resource group.
```
alter database db1 set ("replication_allocation" =
"tag.location.zone1: 3");
```
3. Division of resources at the partition level.
In fact, the setting of the table replica allocation is at the
partition level. That is, each partition saves its own replica allocation
information. Therefore, although we specified resource division at the db and
table levels, the information is ultimately stored at the partition level.
Therefore, we can also support partition level's replica allocation
modification. These modifications can occur in the add partition and modify
partition operations.
4. Modify allocation information
In order to simplify the design, in this issue we only support the
modification of allocation information at the partition level. In the
follow-up, we are considering how to support the modification of the
distribution information of the entire table or the entire database.
3. Support specifying resources when querying
This permission can be set in user property. The user can only query the
replica of the data on the resource with the permission. And the query will
only run in the corresponding resource group. A user can be granted permissions
for multiple resource groups.
4. Replica scheduling
Replica scheduling includes replica complement and replica balance.
1. Repair replicas
Repair replicas include missing replicas, missing versions, node
offline, redundant replicas, not in the corresponding resource group,
Colocation distribution, etc.
2. Replica balance
The scope of replica balancing should be performed within a resource
group. First perform load statistics on the nodes in a resource group, and then
migrate data fragments from high-load nodes to low-load nodes.
## Schedule
* Step1: Support setting tags for BE nodes, and support create table with
specified tags
* Step2: Support tablet repair and balance based on tags.
* Step3: Support privilege checking of tag.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]