This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-doris.git
The following commit(s) were added to refs/heads/master by this push:
new 71aedb994e [refactor][doc] add data-table doc (#8927)
71aedb994e is described below
commit 71aedb994ea32ef45f9723d43f1529d03e09023b
Author: smallhibiscus <[email protected]>
AuthorDate: Sat Apr 9 19:18:44 2022 +0800
[refactor][doc] add data-table doc (#8927)
---
new-docs/en/data-table/advance-usage.md | 82 ++++--
new-docs/en/data-table/basic-usage.md | 151 +++++-----
new-docs/en/data-table/data-partition.md | 192 ++++++-------
new-docs/en/data-table/hit-the-rollup.md | 188 ++++++++++--
new-docs/zh-CN/data-table/advance-usage.md | 269 +++++++++++++++++-
new-docs/zh-CN/data-table/basic-usage.md | 393 ++++++++++++++++++++++++-
new-docs/zh-CN/data-table/data-partition.md | 379 +++++++++++++++++++++++-
new-docs/zh-CN/data-table/hit-the-rollup.md | 427 +++++++++++++++++++++++++++-
8 files changed, 1863 insertions(+), 218 deletions(-)
diff --git a/new-docs/en/data-table/advance-usage.md
b/new-docs/en/data-table/advance-usage.md
index 5429b53c39..87f2073f5a 100644
--- a/new-docs/en/data-table/advance-usage.md
+++ b/new-docs/en/data-table/advance-usage.md
@@ -28,9 +28,9 @@ under the License.
Here we introduce some of Doris's advanced features.
-## Table 1 Structural Change
+## Table Structural Change
-Schema of the table can be modified using the ALTER TABLE command, including
the following modifications:
+Schema of the table can be modified using the [ALTER TABLE
COLUMN](../sql-manual/sql-reference-v2/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN.html)
command, including the following modifications:
* Additional columns
* Delete columns
@@ -54,17 +54,21 @@ Schema of Table 1 is as follows:
We added a new column of uv, type BIGINT, aggregation type SUM, default value
is 0:
-`ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;`
+```sql
+ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
+```
After successful submission, you can view the progress of the job by following
commands:
-`SHOW ALTER TABLE COLUMN;`
+```sql
+SHOW ALTER TABLE COLUMN;
+```
-When the job state is FINISHED, the job is completed. The new Schema is in
force.
+When the job state is `FINISHED`, the job is completed. The new Schema is in
force.
After ALTER TABLE is completed, you can view the latest Schema through `DESC
TABLE`.
-```
+```sql
mysql> DESC table1;
+----------+-------------+------+-------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
@@ -80,13 +84,17 @@ mysql> DESC table1;
The following command can be used to cancel the job currently being executed:
-`CANCEL ALTER TABLE COLUMN FROM table1`
+```sql
+CANCEL ALTER TABLE COLUMN FROM table1;
+```
+
+For more help, see `HELP ALTER TABLE`.
-For more help, see `HELP ALTER TABLE'.
+## Rollup
-## 2 Rollup
+Rollup can be understood as a materialized index structure of Table.
**materialized** because data is store as a concrete ("materialized") table
independently, and **indexing** means that Rollup can adjust column order to
increase the hit rate of prefix index, or reduce key column to increase data
aggregation.
-Rollup can be understood as a materialized index structure of Table. **
materialized ** because data is store as a concrete ("materialized") table
independently, and ** indexing ** means that Rollup can adjust column order to
increase the hit rate of prefix index, or reduce key column to increase data
aggregation.
+Use [ALTER TABLE
ROLLUP](../sql-manual/sql-reference-v2/Data-Definition-Statements/Alter/ALTER-TABLE-ROLLUP.html)
to perform various rollup changes.
Examples are given below.
@@ -106,17 +114,21 @@ Schema of Table 1 is as follows:
For table1 detailed data, siteid, citycode and username form a set of keys,
which aggregate the PV field. If the business side often has the need to see
the total amount of PV in the city, it can build a rollup with only citycode
and pv.
-`ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);`
+```sql
+ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);
+```
After successful submission, you can view the progress of the job by following
commands:
-`SHOW ALTER TABLE ROLLUP;`
+```sql
+SHOW ALTER TABLE ROLLUP;
+```
-When the job state is FINISHED, the job is completed.
+When the job state is `FINISHED`, the job is completed.
When Rollup is established, you can use `DESC table1 ALL` to view the Rollup
information of the table.
-```
+```mysql
mysql> desc table1 all;
+-------------+----------+-------------+------+-------+--------+-------+
| IndexName | Field | Type | Null | Key | Default | Extra |
@@ -135,15 +147,17 @@ mysql> desc table1 all;
The following command can be used to cancel the job currently being executed:
-`CANCEL ALTER TABLE ROLLUP FROM table1;`
+```mysql
+CANCEL ALTER TABLE ROLLUP FROM table1;
+```
-After Rollup is established, the query does not need to specify Rollup to
query. Or specify the original table for query. The program automatically
determines whether Rollup should be used. Whether Rollup is hit or not can be
viewed by the `EXPLAIN your_sql;'command.
+After Rollup is established, the query does not need to specify Rollup to
query. Or specify the original table for query. The program automatically
determines whether Rollup should be used. Whether Rollup is hit or not can be
viewed by the `EXPLAIN your_sql;`command.
For more help, see `HELP ALTER TABLE`.
-## 2 Query of Data Table
+## Query of Data Table
-### 2.1 Memory Limitation
+### Memory Limitation
To prevent a user's query from consuming too much memory. Queries are
controlled in memory. A query task uses no more than 2GB of memory by default
on a single BE node.
@@ -155,7 +169,7 @@ If it is found that 2GB memory cannot be satisfied, the
memory parameters can be
Display query memory limits:
-```
+```sql
mysql> SHOW VARIABLES LIKE "%mem_limit%";
+---------------+------------+
| Variable_name | Value |
@@ -167,9 +181,9 @@ mysql> SHOW VARIABLES LIKE "%mem_limit%";
The unit of `exec_mem_limit` is byte, and the value of `exec_mem_limit` can be
changed by the `SET` command. If changed to 8GB.
-`SET exec_mem_limit = 8589934592;`
-
-```
+```sql
+mysql> SET exec_mem_limit = 8589934592;
+Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE "%mem_limit%";
+---------------+------------+
| Variable_name | Value |
@@ -182,13 +196,13 @@ mysql> SHOW VARIABLES LIKE "%mem_limit%";
>* The above modification is session level and is only valid within the
>current connection session. Disconnecting and reconnecting will change back
>to the default value.
>* If you need to modify the global variable, you can set it as follows: `SET
>GLOBAL exec_mem_limit = 8589934592;` When the setup is complete, disconnect
>the session and log in again, and the parameters will take effect permanently.
-### 2.2 Query timeout
+### Query timeout
The current default query time is set to 300 seconds. If a query is not
completed within 300 seconds, the query will be cancelled by the Doris system.
Users can use this parameter to customize the timeout time of their
applications and achieve a blocking mode similar to wait (timeout).
View the current timeout settings:
-```
+```sql
mysql> SHOW VARIABLES LIKE "%query_timeout%";
+---------------+-------+
| Variable_name | Value |
@@ -200,12 +214,15 @@ mysql> SHOW VARIABLES LIKE "%query_timeout%";
Modify the timeout to 1 minute:
-`SET query timeout =60;`
+```sql
+mysql> SET query_timeout = 60;
+Query OK, 0 rows affected (0.00 sec)
+```
>* The current timeout check interval is 5 seconds, so timeouts less than 5
>seconds are not very accurate.
>* The above modifications are also session level. Global validity can be
>modified by `SET GLOBAL`.
-### 2.3 Broadcast/Shuffle Join
+### Broadcast/Shuffle Join
By default, the system implements Join by conditionally filtering small
tables, broadcasting them to the nodes where the large tables are located,
forming a memory Hash table, and then streaming out the data of the large
tables Hash Join. However, if the amount of data filtered by small tables
cannot be put into memory, Join will not be able to complete at this time. The
usual error should be caused by memory overrun first.
@@ -215,7 +232,7 @@ Doris will try to use Broadcast Join first. If small tables
are too large to bro
Use Broadcast Join (default):
-```
+```sql
mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
+--------------------+
| sum(`table1`.`pv`) |
@@ -227,7 +244,7 @@ mysql> select sum(table1.pv) from table1 join table2 where
table1.siteid = 2;
Use Broadcast Join (explicitly specified):
-```
+```sql
mysql> select sum(table1.pv) from table1 join [broadcast] table2 where
table1.siteid = 2;
+--------------------+
| sum(`table1`.`pv`) |
@@ -239,7 +256,7 @@ mysql> select sum(table1.pv) from table1 join [broadcast]
table2 where table1.si
Shuffle Join:
-```
+```sql
mysql> select sum(table1.pv) from table1 join [shuffle] table2 where
table1.siteid = 2;
+--------------------+
| sum(`table1`.`pv`) |
@@ -249,7 +266,7 @@ mysql> select sum(table1.pv) from table1 join [shuffle]
table2 where table1.site
1 row in set (0.15 sec)
```
-### 2.4 Query Retry and High Availability
+### Query Retry and High Availability
When multiple FE nodes are deployed, users can deploy load balancing layers on
top of multiple FEs to achieve high availability of Doris.
@@ -271,4 +288,7 @@
jdbc:mysql://[host1][:port1],[host2][:port2][,[host3][:port3]]...[/[database]][?
Applications can connect to and deploy MySQL Proxy on the same machine by
configuring MySQL Proxy's Failover and Load Balance functions.
-`http://dev.mysql.com/doc/refman/5.6/en/mysql-proxy-using.html`
\ No newline at end of file
+```
+https://dev.mysql.com/doc/refman/5.6/en/proxy-users.html
+```
+
diff --git a/new-docs/en/data-table/basic-usage.md
b/new-docs/en/data-table/basic-usage.md
index 4819c76da1..56c2581d82 100644
--- a/new-docs/en/data-table/basic-usage.md
+++ b/new-docs/en/data-table/basic-usage.md
@@ -29,56 +29,75 @@ under the License.
Doris uses MySQL protocol to communicate. Users can connect to Doris cluster
through MySQL client or MySQL JDBC. When selecting the MySQL client version, it
is recommended to use the version after 5.1, because user names of more than 16
characters cannot be supported before 5.1. This paper takes MySQL client as an
example to show users the basic usage of Doris through a complete process.
-## 1 Create Users
+## Create Users
-### 1.1 Root User Logon and Password Modification
+### Root User Logon and Password Modification
Doris has built-in root and admin users, and the password is empty by default.
After starting the Doris program, you can connect to the Doris cluster through
root or admin users.
Use the following command to log in to Doris:
-```
-mysql -h FE_HOST -P9030 -uroot
+```sql
+[root@doris ~]# mysql -h FE_HOST -P9030 -uroot
+Welcome to the MySQL monitor. Commands end with ; or \g.
+Your MySQL connection id is 41
+Server version: 5.1.0 Doris version 1.0.0-preview2-b48ee2734
+
+Copyright (c) 2000, 2022, Oracle and/or its affiliates.
+
+Oracle is a registered trademark of Oracle Corporation and/or its
+affiliates. Other names may be trademarks of their respective
+owners.
+
+Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
+
+mysql>
```
->` fe_host` is the IP address of any FE node. ` 9030 ` is the query_port
configuration in fe.conf.
+>` FE_HOST` is the IP address of any FE node. ` 9030 ` is the query_port
configuration in fe.conf.
After login, you can modify the root password by following commands
-```
-SET PASSWORD FOR 'root' = PASSWORD('your_password');
+```sql
+mysql> SET PASSWORD FOR 'root' = PASSWORD('your_password');
+Query OK, 0 rows affected (0.00 sec)
```
-### 1.3 Creating New Users
+> `your_password` is a new password for the `root` user, which can be set at
will. It is recommended to set a strong password to increase security, and use
the new password to log in the next time you log in.
-Create an ordinary user with the following command.
+### Creating New Users
-```
-CREATE USER 'test' IDENTIFIED BY 'test_passwd';
+We can create a regular user `test` with the following command:
+
+```sql
+mysql> CREATE USER 'test' IDENTIFIED BY 'test_passwd';
+Query OK, 0 rows affected (0.00 sec)
```
Follow-up login can be done through the following connection commands.
-```
-mysql -h FE_HOST -P9030 -utest -ptest_passwd
+```sql
+[root@doris ~]# mysql -h FE_HOST -P9030 -utest -ptest_passwd
```
> By default, the newly created common user does not have any permissions.
> Permission grants can be referred to later permission grants.
-## 2 Data Table Creation and Data Import
+## Data Table Creation and Data Import
-### 2.1 Create a database
+### Create a database
Initially, a database can be created through root or admin users:
-`CREATE DATABASE example_db;`
-
-> All commands can use `HELP` command to see detailed grammar help. For
example: `HELP CREATE DATABASE;'`
+```sql
+CREATE DATABASE example_db;
+```
+> All commands can use `HELP` command to see detailed grammar help. For
example: `HELP CREATE DATABASE;'`.You can also refer to the official website
[SHOW CREATE
DATABASE](../sql-manual/sql-reference-v2/Show-Statements/SHOW-CREATE-DATABASE.html)
command manual.
+>
> If you don't know the full name of the command, you can use "help command a
> field" for fuzzy query. If you type `HELP CREATE`, you can match commands
> like `CREATE DATABASE', `CREATE TABLE', `CREATE USER', etc.
-After the database is created, you can view the database information through
`SHOW DATABASES'.
+After the database is created, you can view the database information through
[SHOW
DATABASES](../sql-manual/sql-reference-v2/Show-Statements/SHOW-DATABASES.html#show-databases).
-```
+```sql
MySQL> SHOW DATABASES;
+--------------------+
| Database |
@@ -89,41 +108,29 @@ MySQL> SHOW DATABASES;
2 rows in set (0.00 sec)
```
-Information_schema exists to be compatible with MySQL protocol. In practice,
information may not be very accurate. Therefore, information about specific
databases is suggested to be obtained by directly querying the corresponding
databases.
+> Information_schema exists to be compatible with MySQL protocol. In practice,
information may not be very accurate. Therefore, information about specific
databases is suggested to be obtained by directly querying the corresponding
databases.
-### 2.2 Account Authorization
+### Account Authorization
After the example_db is created, the read and write permissions of example_db
can be authorized to ordinary accounts, such as test, through the root/admin
account. After authorization, the example_db database can be operated by
logging in with the test account.
-`GRANT ALL ON example_db TO test;`
-
-### 2.3 Formulation
-
-Create a table using the `CREATE TABLE` command. More detailed parameters can
be seen:
-
-`HELP CREATE TABLE;`
-
-First switch the database:
-
-`USE example_db;`
-
-Doris supports single partition and composite partition.
-
-In the composite partition:
-
-* The first level is called Partition, or partition. Users can specify a
dimension column as a partition column (currently only integer and time type
columns are supported), and specify the range of values for each partition.
+```sql
+mysql> GRANT ALL ON example_db TO test;
+Query OK, 0 rows affected (0.01 sec)
+```
-* The second stage is called Distribution, or bucket division. Users can
specify one or more dimension columns and the number of buckets for HASH
distribution of data.
+### Formulation
-Composite partitioning is recommended for the following scenarios
+Create a table using the [CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html)
command. More detailed parameters can be seen:`HELP CREATE TABLE;`
-* There are time dimensions or similar dimensions with ordered values, which
can be used as partition columns. The partition granularity can be evaluated
according to the frequency of importation and the amount of partition data.
-* Historic data deletion requirements: If there is a need to delete historical
data (for example, only the last N days of data are retained). Using composite
partitions, you can achieve this by deleting historical partitions. Data can
also be deleted by sending a DELETE statement within a specified partition.
-* Solve the data skew problem: Each partition can specify the number of
buckets separately. If dividing by day, when the amount of data varies greatly
every day, we can divide the data of different partitions reasonably by the
number of buckets in the specified partition. Bucket columns recommend choosing
columns with high degree of differentiation.
+First, we need to switch databases using the
[USE](../sql-manual/sql-reference-v2/Utility-Statements/USE.html) command:
-Users can also use no composite partitions, even single partitions. Then the
data are only distributed by HASH.
+```sql
+mysql> USE example_db;
+Database changed
+```
-Taking the aggregation model as an example, the following two partitions are
illustrated separately.
+Doris supports [composite partition and single
partition](data-partition.html#composite partition and single partition) two
table building methods. The following takes the aggregation model as an example
to demonstrate how to create two partitioned data tables.
#### Single partition
@@ -137,7 +144,7 @@ The schema of this table is as follows:
* pv: Type is BIGINT (8 bytes), default value is 0; this is an index column,
Doris will aggregate the index column internally, the aggregation method of
this column is SUM.
The TABLE statement is as follows:
-```
+```sql
CREATE TABLE table1
(
siteid INT DEFAULT '10',
@@ -173,7 +180,7 @@ We use the event_day column as the partition column to
create three partitions:
Each partition uses siteid to hash buckets, with a bucket count of 10
The TABLE statement is as follows:
-```
+```sql
CREATE TABLE table2
(
event_day DATE,
@@ -195,7 +202,7 @@ PROPERTIES("replication_num" = "1");
After the table is built, you can view the information of the table in
example_db:
-```
+```sql
MySQL> SHOW TABLES;
+----------------------+
| Tables_in_example_db |
@@ -233,14 +240,14 @@ MySQL> DESC table2;
>
> 1. By setting replication_num, the above tables are all single-copy tables.
> Doris recommends that users adopt the default three-copy settings to ensure
> high availability.
> 2. Composite partition tables can be added or deleted dynamically. See the
> Partition section in `HELP ALTER TABLE`.
-> 3. Data import can import the specified Partition. See `HELP LOAD'.
-> 4. Schema of table can be dynamically modified.
+> 3. Data import can import the specified Partition. See `HELP LOAD;`.
+> 4. Schema of table can be dynamically modified, See `HELP ALTER TABLE;`.
> 5. Rollup can be added to Table to improve query performance. This section
> can be referred to the description of Rollup in Advanced Usage Guide.
> 6. The default value of Null property for column is true, which may result
> in poor scan performance.
-### 2.4 Import data
+### Import data
-Doris supports a variety of data import methods. Specifically, you can refer
to the data import document. Here we use streaming import and Broker import as
examples.
+Doris supports a variety of data import methods. Specifically, you can refer
to the [data import](../data-operate/import/load-manual.html) document. Here we
use streaming import and Broker import as examples.
#### Flow-in
@@ -248,7 +255,7 @@ Streaming import transfers data to Doris via HTTP protocol.
It can import local
Example 1: With "table1_20170707" as Label, import table1 tables using the
local file table1_data.
-```
+```bash
curl --location-trusted -u test:test_passwd -H "label:table1_20170707" -H
"column_separator:," -T table1_data
http://FE_HOST:8030/api/example_db/table1/_stream_load
```
@@ -257,7 +264,7 @@ curl --location-trusted -u test:test_passwd -H
"label:table1_20170707" -H "colum
The local file `table1_data` takes `,` as the separation between data, and the
specific contents are as follows:
-```
+```text
1,1,Jim,2
2,1,grace,2
3,2,tom,2
@@ -267,7 +274,7 @@ The local file `table1_data` takes `,` as the separation
between data, and the s
Example 2: With "table2_20170707" as Label, import table2 tables using the
local file table2_data.
-```
+```bash
curl --location-trusted -u test:test -H "label:table2_20170707" -H
"column_separator:|" -T table2_data
http://127.0.0.1:8030/api/example_db/table2/_stream_load
```
@@ -293,7 +300,7 @@ Broker imports import data from external storage through
deployed Broker process
Example: Import files on HDFS into table1 table with "table1_20170708" as Label
-```
+```sql
LOAD LABEL table1_20170708
(
DATA INFILE("hdfs://your.namenode.host:port/dir/table1_data")
@@ -313,23 +320,27 @@ PROPERTIES
Broker imports are asynchronous commands. Successful execution of the above
commands only indicates successful submission of tasks. Successful imports need
to be checked through `SHOW LOAD;' Such as:
-`SHOW LOAD WHERE LABLE = "table1_20170708";`
+```sql
+SHOW LOAD WHERE LABLE = "table1_20170708";
+```
-In the return result, FINISHED in the `State` field indicates that the import
was successful.
+In the return result, `FINISHED` in the `State` field indicates that the
import was successful.
For more instructions on `SHOW LOAD`, see` HELP SHOW LOAD; `
Asynchronous import tasks can be cancelled before the end:
-`CANCEL LOAD WHERE LABEL = "table1_20170708";`
+```sql
+CANCEL LOAD WHERE LABEL = "table1_20170708";
+```
-## 3 Data query
+## Data query
-### 3.1 Simple Query
+### Simple Query
-Examples:
+Query example::
-```
+```sql
MySQL> SELECT * FROM table1 LIMIT 3;
+--------+----------+----------+------+
| siteid | citycode | username | pv |
@@ -353,11 +364,11 @@ MySQL> SELECT * FROM table1 ORDER BY citycode;
5 rows in set (0.01 sec)
```
-### 3.3 Join Query
+### Join Query
-Examples:
+Query example::
-```
+```sql
MySQL> SELECT SUM(table1.pv) FROM table1 JOIN table2 WHERE table1.siteid =
table2.siteid;
+--------------------+
| sum(`table1`.`pv`) |
@@ -367,11 +378,11 @@ MySQL> SELECT SUM(table1.pv) FROM table1 JOIN table2
WHERE table1.siteid = table
1 row in set (0.20 sec)
```
-### 3.4 Subquery
+### Subquery
-Examples:
+Query example::
-```
+```sql
MySQL> SELECT SUM(pv) FROM table2 WHERE siteid IN (SELECT siteid FROM table1
WHERE siteid > 2);
+-----------+
| sum(`pv`) |
diff --git a/new-docs/en/data-table/data-partition.md
b/new-docs/en/data-table/data-partition.md
index 4ab0241d95..714c7340cb 100644
--- a/new-docs/en/data-table/data-partition.md
+++ b/new-docs/en/data-table/data-partition.md
@@ -36,7 +36,7 @@ In Doris, data is logically described in the form of a table.
A table includes rows (rows) and columns (columns). Row is a row of data for
the user. Column is used to describe different fields in a row of data.
-Column can be divided into two broad categories: Key and Value. From a
business perspective, Key and Value can correspond to dimension columns and
metric columns, respectively. From the perspective of the aggregation model,
the same row of Key columns will be aggregated into one row. The way the Value
column is aggregated is specified by the user when the table is built. For an
introduction to more aggregation models, see the [Doris Data
Model](./data-model-rollup.md).
+Column can be divided into two broad categories: Key and Value. From a
business perspective, Key and Value can correspond to dimension columns and
metric columns, respectively. From the perspective of the aggregation model,
the same row of Key columns will be aggregated into one row. The way the Value
column is aggregated is specified by the user when the table is built. For an
introduction to more aggregation models, see the [Doris Data
Model](./data-model.html).
### Tablet & Partition
@@ -50,13 +50,11 @@ Several Partitions form a Table. Partition can be thought
of as the smallest log
We use a table-building operation to illustrate Doris' data partitioning.
-Doris's built-in table is a synchronous command. If the command returns
successfully, it means that the table is built successfully.
-
-See more help with `HELP CREATE TABLE;`.
+Doris's table creation is a synchronous command. The result is returned after
the SQL execution is completed. If the command returns successfully, it means
that the table creation is successful. For specific table creation syntax,
please refer to [CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html),
or you can view more details through `HELP CREATE TABLE;` Much help.See more
help with `HELP CREATE TABLE;`.
This section introduces Doris's approach to building tables with an example.
-```
+```sql
-- Range Partition
CREATE TABLE IF NOT EXISTS example_db.expamle_range_tbl
@@ -124,7 +122,7 @@ PROPERTIES
### Column Definition
-Here we only use the AGGREGATE KEY data model as an example. See the [Doris
Data Model](./data-model-rollup.md) for more data models.
+Here we only use the AGGREGATE KEY data model as an example. See the [Doris
Data Model](./data-model.html) for more data models.
The basic type of column can be viewed by executing `HELP CREATE TABLE;` in
mysql-client.
@@ -134,11 +132,11 @@ When defining columns, you can refer to the following
suggestions:
1. The Key column must precede all Value columns.
2. Try to choose the type of integer. Because integer type calculations and
lookups are much more efficient than strings.
-3. For the selection principle of integer types of different lengths, follow
** enough to **.
-4. For lengths of type VARCHAR and STRING, follow ** is sufficient.
+3. For the selection principle of integer types of different lengths, follow
**enough to**.
+4. For lengths of type VARCHAR and STRING, follow **enough to**.
5. The total byte length of all columns (including Key and Value) cannot
exceed 100KB.
-### Partitioning and binning
+### Partitioning and Bucket
Doris supports two levels of data partitioning. The first layer is Partition,
which supports Range and List partitioning. The second layer is the Bucket
(Tablet), which only supports Hash partitioning.
@@ -150,11 +148,14 @@ It is also possible to use only one layer of
partitioning. When using a layer pa
* Regardless of the type of partition column, double quotes are required
when writing partition values.
* There is no theoretical limit on the number of partitions.
* When you do not use Partition to build a table, the system will
automatically generate a Partition with the same name as the table name. This
Partition is not visible to the user and cannot be modified.
-
+ * **Do not add partitions with overlapping ranges** when creating
partitions.
+
#### Range Partition
* Partition columns are usually time columns for easy management of old
and new data.
+
* Partition supports only the upper bound by `VALUES LESS THAN (...)`, the
system will use the upper bound of the previous partition as the lower bound of
the partition, and generate a left closed right open interval. Passing, also
supports specifying the upper and lower bounds by `VALUES [...)`, and
generating a left closed right open interval.
+
* It is easier to understand by specifying `VALUES [...)`. Here is an
example of the change in partition range when adding or deleting partitions
using the `VALUES LESS THAN (...)` statement:
* As in the `example_range_tbl` example above, when the table is
built, the following 3 partitions are automatically generated:
```
@@ -162,8 +163,9 @@ It is also possible to use only one layer of partitioning.
When using a layer pa
P201702: [2017-02-01, 2017-03-01)
P201703: [2017-03-01, 2017-04-01)
```
+
* When we add a partition p201705 VALUES LESS THAN ("2017-06-01"), the
partition results are as follows:
-
+
```
P201701: [MIN_VALUE, 2017-02-01)
P201702: [2017-02-01, 2017-03-01)
@@ -186,11 +188,12 @@ It is also possible to use only one layer of
partitioning. When using a layer pa
```
p201701: [MIN_VALUE, 2017-02-01)
p201705: [2017-04-01, 2017-06-01)
- The void range becomes: [2017-02-01, 2017-04-01)
```
- * Now add a partition p201702new VALUES LESS THAN ("2017-03-01"), the
partition results are as follows:
+ > The void range becomes: [2017-02-01, 2017-04-01)
+ * Now add a partition p201702new VALUES LESS THAN ("2017-03-01"), the
partition results are as follows:
+
```
p201701: [MIN_VALUE, 2017-02-01)
p201702new: [2017-02-01, 2017-03-01)
@@ -200,7 +203,7 @@ It is also possible to use only one layer of partitioning.
When using a layer pa
> You can see that the hole size is reduced to: [2017-03-01,
2017-04-01)
* Now delete partition p201701 and add partition p201612 VALUES LESS
THAN ("2017-01-01"), the partition result is as follows:
-
+
```
p201612: [MIN_VALUE, 2017-01-01)
p201702new: [2017-02-01, 2017-03-01)
@@ -209,9 +212,42 @@ It is also possible to use only one layer of partitioning.
When using a layer pa
> A new void appeared: [2017-01-01, 2017-02-01)
+
In summary, the deletion of a partition does not change the scope of an
existing partition. There may be holes in deleting partitions. When a partition
is added by the `VALUES LESS THAN` statement, the lower bound of the partition
immediately follows the upper bound of the previous partition.
+
+ In addition to the single-column partitioning we have seen above, Range
partition also supports **multi-column partitioning**, examples are as follows:
+
+ ```text
+ PARTITION BY RANGE(`date`, `id`)
+ (
+ PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
+ PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
+ PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
+ )
+ ```
- You cannot add partitions with overlapping ranges.
+ In the above example, we specify `date` (DATE type) and `id` (INT type) as
partition columns. The resulting partitions in the above example are as follows:
+
+ ```
+ *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
+ *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
+ *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
+ ```
+
+ Note that the last partition user defaults only the partition value of the
`date` column, so the partition value of the `id` column will be filled with
`MIN_VALUE` by default. When the user inserts data, the partition column values
are compared in order, and the corresponding partition is finally obtained.
Examples are as follows:
+
+ ```
+ * Data --> Partition
+ * 2017-01-01, 200 --> p201701_1000
+ * 2017-01-01, 2000 --> p201701_1000
+ * 2017-02-01, 100 --> p201701_1000
+ * 2017-02-01, 2000 --> p201702_2000
+ * 2017-02-15, 5000 --> p201702_2000
+ * 2017-03-01, 2000 --> p201703_all
+ * 2017-03-10, 1 --> p201703_all
+ * 2017-04-01, 1000 --> Unable to import
+ * 2017-05-01, 1000 --> Unable to import
+ ```
#### List Partition
@@ -244,7 +280,36 @@ It is also possible to use only one layer of partitioning.
When using a layer pa
p_uk: ("London")
```
- You cannot add partitions with overlapping ranges.
+ List partition also supports **multi-column partition**, examples are as
follows:
+
+ ```text
+ PARTITION BY LIST(`id`, `city`)
+ (
+ PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
+ PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
+ PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
+ )
+ ```
+
+ In the above example, we specify `id`(INT type) and `city`(VARCHAR type)
as partition columns. The above example ends up with the following partitions.
+
+ ```
+ * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
+ * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
+ * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
+ ```
+
+ When the user inserts data, the partition column values will be compared
sequentially in order to finally get the corresponding partition. An example is
as follows.
+
+ ```
+ * Data ---> Partition
+ * 1, Beijing ---> p1_city
+ * 1, Shanghai ---> p1_city
+ * 2, Shanghai ---> p2_city
+ * 3, Beijing ---> p3_city
+ * 1, Tianjin ---> Unable to import
+ * 4, Beijing ---> Unable to import
+ ```
2. Bucket
@@ -267,96 +332,27 @@ It is also possible to use only one layer of
partitioning. When using a layer pa
* Once the number of Buckets for a Partition is specified, it cannot be
changed. Therefore, when determining the number of Buckets, you need to
consider the expansion of the cluster in advance. For example, there are
currently only 3 hosts, and each host has 1 disk. If the number of Buckets is
only set to 3 or less, then even if you add more machines later, you can't
increase the concurrency.
* Give some examples: Suppose there are 10 BEs, one for each BE disk. If
the total size of a table is 500MB, you can consider 4-8 shards. 5GB: 8-16.
50GB: 32. 500GB: Recommended partitions, each partition is about 50GB in size,
with 16-32 shards per partition. 5TB: Recommended partitions, each with a size
of around 50GB and 16-32 shards per partition.
- > Note: The amount of data in the table can be viewed by the `show data`
command. The result is divided by the number of copies, which is the amount of
data in the table.
+ > Note: The amount of data in the table can be viewed by the `[show
data](../sql-manual/sql-reference-v2/Show-Statements/SHOW-DATA.html)` command.
The result is divided by the number of copies, which is the amount of data in
the table.
-#### Multi-column partition
-
-Doris supports specifying multiple columns as partition columns, examples are
as follows:
-
-##### Range Partition
-
- ```
- PARTITION BY RANGE(`date`, `id`)
- (
- PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
- PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
- PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
- )
- ```
-
- In the above example, we specify `date` (DATE type) and `id` (INT type) as
partition columns. The resulting partitions in the above example are as follows:
-
- ```
- *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
- *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
- *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
- ```
-
- Note that the last partition user defaults only the partition value of the
`date` column, so the partition value of the `id` column will be filled with
`MIN_VALUE` by default. When the user inserts data, the partition column values
are compared in order, and the corresponding partition is finally obtained.
Examples are as follows:
-
- ```
- * Data --> Partition
- * 2017-01-01, 200 --> p201701_1000
- * 2017-01-01, 2000 --> p201701_1000
- * 2017-02-01, 100 --> p201701_1000
- * 2017-02-01, 2000 --> p201702_2000
- * 2017-02-15, 5000 --> p201702_2000
- * 2017-03-01, 2000 --> p201703_all
- * 2017-03-10, 1 --> p201703_all
- * 2017-04-01, 1000 --> Unable to import
- * 2017-05-01, 1000 --> Unable to import
- ```
-##### List Partition
+#### Compound Partitions vs Single Partitions
- ```
- PARTITION BY LIST(`id`, `city`)
- (
- PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
- PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
- PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
- )
- ```
+Compound Partitions
- In the above example, we specify `id`(INT type) and `city`(VARCHAR type)
as partition columns. The above example ends up with the following partitions.
+- The first level is called Partition, which is partition. Users can specify a
dimension column as a partition column (currently only columns of integer and
time types are supported), and specify the value range of each partition.
+- The second level is called Distribution, which means bucketing. Users can
specify one or more dimension columns and the number of buckets to perform HASH
distribution on the data.
- ```
- * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
- * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
- * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
- ```
+Composite partitions are recommended for the following scenarios
- When the user inserts data, the partition column values will be compared
sequentially in order to finally get the corresponding partition. An example is
as follows.
+- There is a time dimension or similar dimension with ordered values, which
can be used as a partition column. Partition granularity can be evaluated based
on import frequency, partition data volume, etc.
+- Deletion of historical data: If there is a need to delete historical data
(for example, only keep the data of the last N days). With composite
partitions, this can be achieved by removing historical partitions. Data
deletion is also possible by sending a DELETE statement within the specified
partition.
+- Solve the problem of data skew: each partition can specify the number of
buckets individually. For example, partitioning by day, when the amount of data
per day varies greatly, you can reasonably divide the data in different
partitions by specifying the number of buckets for the partition. It is
recommended to select a column with a large degree of discrimination for the
bucketing column.
- ```
- * Data ---> Partition
- * 1, Beijing ---> p1_city
- * 1, Shanghai ---> p1_city
- * 2, Shanghai ---> p2_city
- * 3, Beijing ---> p3_city
- * 1, Tianjin ---> Unable to import
- * 4, Beijing ---> Unable to import
- ```
+The user can also use a single partition without using composite partitions.
Then the data is only distributed in HASH.
### PROPERTIES
-In the last PROPERTIES of the table statement, you can specify the following
two parameters:
-
-Replication_num
-
- * The number of copies per tablet. The default is 3, it is recommended to
keep the default. In the build statement, the number of Tablet copies in all
Partitions is uniformly specified. When you add a new partition, you can
individually specify the number of copies of the tablet in the new partition.
- * The number of copies can be modified at runtime. It is strongly
recommended to keep odd numbers.
- * The maximum number of copies depends on the number of independent IPs in
the cluster (note that it is not the number of BEs). The principle of replica
distribution in Doris is that the copies of the same Tablet are not allowed to
be distributed on the same physical machine, and the physical machine is
identified as IP. Therefore, even if 3 or more BE instances are deployed on the
same physical machine, if the BEs have the same IP, you can only set the number
of copies to 1.
- * For some small, and infrequently updated dimension tables, consider
setting more copies. In this way, when joining queries, there is a greater
probability of local data join.
-
-2. storage_medium & storage\_cooldown\_time
-
- * The BE data storage directory can be explicitly specified as SSD or HDD
(differentiated by .SSD or .HDD suffix). When you build a table, you can
uniformly specify the media for all Partition initial storage. Note that the
suffix is to explicitly specify the disk media without checking to see if it
matches the actual media type.
- * The default initial storage media can be specified by
`default_storage_medium= XXX` in the fe configuration file `fe.conf`, or, if
not, by default, HDD. If specified as an SSD, the data is initially stored on
the SSD.
- * If storage\_cooldown\_time is not specified, the data is automatically
migrated from the SSD to the HDD after 30 days by default. If
storage\_cooldown\_time is specified, the data will not migrate until the
storage_cooldown_time time is reached.
- * Note that when storage_medium is specified, if FE parameter
'enable_strict_storage_medium_check' is' False 'this parameter is simply a' do
your best 'setting. Even if SSD storage media is not set up within the cluster,
no errors are reported, and it is automatically stored in the available data
directory.
- Similarly, if the SSD media is not accessible and space is insufficient,
it is possible to initially store data directly on other available media. When
the data is due to be migrated to an HDD, the migration may also fail (but will
try again and again) if the HDD medium is not accessible and space is
insufficient.
- If FE parameter 'enable_strict_storage_medium_check' is' True ', then
'Failed to find enough host in all Backends with storage medium is SSD' will be
reported when SSD storage medium is not set in the cluster.
+In the last PROPERTIES of the table building statement, for the relevant
parameters that can be set in PROPERTIES, we can check [CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE
.html) for a detailed introduction.
### ENGINE
@@ -396,3 +392,7 @@ In this example, the type of ENGINE is olap, the default
ENGINE type. In Doris,
Doris's table creation command is a synchronous command. The timeout of
this command is currently set to be relatively simple, ie (tablet num *
replication num) seconds. If you create more data fragments and have fragment
creation failed, it may cause an error to be returned after waiting for a long
timeout.
Under normal circumstances, the statement will return in a few seconds or
ten seconds. If it is more than one minute, it is recommended to cancel this
operation directly and go to the FE or BE log to view the related errors.
+
+## More help
+
+For more detailed instructions on data partitioning, we can refer to the
[CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html)
command manual, and also You can enter `HELP CREATE TABLE;` under the Mysql
client to get more help information.
diff --git a/new-docs/en/data-table/hit-the-rollup.md
b/new-docs/en/data-table/hit-the-rollup.md
index 4cc8d7ed42..6690d70d42 100644
--- a/new-docs/en/data-table/hit-the-rollup.md
+++ b/new-docs/en/data-table/hit-the-rollup.md
@@ -26,6 +26,156 @@ under the License.
# Rollup and query
+ROLLUP in multidimensional analysis means "scroll up", which means that data
is aggregated further at a specified granularity.
+
+### Basic concepts
+
+In Doris, we make the table created by the user through the table building
statement a Base table. Base table holds the basic data stored in the way
specified by the user's table-building statement.
+
+On top of the Base table, we can create any number of ROLLUP tables. These
ROLLUP data are generated based on the Base table and physically **stored
independently**.
+
+The basic function of ROLLUP tables is to obtain coarser aggregated data on
the basis of Base tables.
+
+Let's illustrate the ROLLUP tables and their roles in different data models
with examples.
+
+#### ROLLUP in Aggregate Model and Uniq Model
+
+Because Uniq is only a special case of the Aggregate model, we do not
distinguish it here.
+
+Example 1: Get the total consumption per user
+
+Following [Data Model Aggregate Model](data-model.html#Aggregate Model) in the
**Aggregate Model** section, the Base table structure is as follows:
+
+| ColumnName | Type | AggregationType | Comment
|
+|-------------------| ------------ | --------------- |
-------------------------------------- |
+| user\_id | LARGEINT | | user id
|
+| date | DATE | | date of data filling
|
+| timestamp | DATETIME | | Data filling time,
accurate to seconds |
+| city | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| sex | TINYINT | | User gender
|
+| last\_visit\_date | DATETIME | REPLACE | Last user access time
|
+| cost | BIGINT | SUM | Total User Consumption
|
+| max\_dwell\_time | INT | MAX | Maximum user residence
time |
+| min\_dwell\_time | INT | MIN | User minimum residence
time |
+
+The data stored are as follows:
+
+| user\_id | date | timestamp | city | age | sex |
last\_visit\_date | cost | max\_dwell\_time | min\_dwell\_time |
+|----------| ---------- | ------------------- | --------- | ---- | ---- |
------------------- | ---- | ---------------- | ---------------- |
+| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | Beijing | 20 | 0 |
2017-10-01 06:00 | 20 | 10 | 10 |
+| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | Beijing | 20 | 0 |
2017-10-01 07:00 | 15 | 2 | 2 |
+| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | Beijing | 30 | 1 |
2017-10-01 17:05:45 | 2 | 22 | 22 |
+| 10002 | 2017-10-02 | 2017-10-02 13:10:00 | Shanghai | 20 | 1 |
2017-10-02 12:59:12 | 200 | 5 | 5 |
+| 10003 | 2017-10-02 | 2017-10-02 13:15:00 | Guangzhou | 32 | 0 |
2017-10-02 11:20:00 | 30 | 11 | 11 |
+| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | Shenzhen | 35 | 0 |
2017-10-01 10:00:15 | 100 | 3 | 3 |
+| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | Shenzhen | 35 | 0 |
2017-10-03 10:20:22 | 11 | 6 | 6 |
+
+On this basis, we create a ROLLUP:
+
+| ColumnName |
+|------------|
+| user\_id |
+| cost |
+
+The ROLLUP contains only two columns: user\_id and cost. After the creation,
the data stored in the ROLLUP is as follows:
+
+| user\_id | cost |
+| -------- | ---- |
+| 10000 | 35 |
+| 10001 | 2 |
+| 10002 | 200 |
+| 10003 | 30 |
+| 10004 | 111 |
+
+As you can see, ROLLUP retains only the results of SUM on the cost column for
each user_id. So when we do the following query:
+
+`SELECT user_id, sum(cost) FROM table GROUP BY user_id;`
+
+Doris automatically hits the ROLLUP table, thus completing the aggregated
query by scanning only a very small amount of data.
+
+2. Example 2: Get the total consumption, the longest and shortest page
residence time of users of different ages in different cities
+
+Follow example 1. Based on the Base table, we create a ROLLUP:
+
+| ColumnName | Type | AggregationType | Comment
|
+|------------------| ------------ | --------------- |
--------------------------- |
+| city | VARCHAR (20) | | User City
|
+| age | SMALLINT | | User age
|
+| cost | BIGINT | SUM | Total User Consumption
|
+| max\_dwell\_time | INT | MAX | Maximum user residence
time |
+| min\_dwell\_time | INT | MIN | User minimum residence
time |
+
+After the creation, the data stored in the ROLLUP is as follows:
+
+| city | age | cost | max\_dwell\_time | min\_dwell\_time |
+| --------- | ---- | ---- | ---------------- | ---------------- |
+| Beijing | 20 | 35 | 10 | 2 |
+| Beijing | 30 | 2 | 22 | 22 |
+| Shanghai | 20 | 200 | 5 | 5 |
+| Guangzhou | 32 | 30 | 11 | 11 |
+| Shenzhen | 35 | 111 | 6 | 3 |
+
+When we do the following queries:
+
+```sql
+mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time)
FROM table GROUP BY city, age;
+mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM
table GROUP BY city;
+mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY
city, age;
+```
+
+Doris automatically hits the ROLLUP table.
+
+#### ROLLUP in Duplicate Model
+
+Because the Duplicate model has no aggregate semantics. So the ROLLLUP in this
model has lost the meaning of "scroll up". It's just to adjust the column order
to hit the prefix index. In the next section, we will introduce prefix index in
[data model prefix index](data-model.html#prefix index), and how to use ROLLUP
to change prefix index in order to achieve better query efficiency.
+
+## ROLLUP adjusts prefix index
+
+Because column order is specified when a table is built, there is only one
prefix index for a table. This may be inefficient for queries that use other
columns that cannot hit prefix indexes as conditions. Therefore, we can
manually adjust the order of columns by creating ROLLUP. Examples are given.
+
+The structure of the Base table is as follows:
+
+| ColumnName | Type |
+| ---------------- | ------------ |
+| user\_id | BIGINT |
+| age | INT |
+| message | VARCHAR(100) |
+| max\_dwell\_time | DATETIME |
+| min\_dwell\_time | DATETIME |
+
+On this basis, we can create a ROLLUP table:
+
+| ColumnName | Type |
+| ---------------- | ------------ |
+| age | INT |
+| user\_id | BIGINT |
+| message | VARCHAR(100) |
+| max\_dwell\_time | DATETIME |
+| min\_dwell\_time | DATETIME |
+
+As you can see, the columns of ROLLUP and Base tables are exactly the same,
just changing the order of user_id and age. So when we do the following query:
+
+```sql
+mysql> SELECT * FROM table where age=20 and message LIKE "%error%";
+```
+
+The ROLLUP table is preferred because the prefix index of ROLLUP matches
better.
+
+## Some Explanations of ROLLUP
+
+* The fundamental role of ROLLUP is to improve the query efficiency of some
queries (whether by aggregating to reduce the amount of data or by modifying
column order to match prefix indexes). Therefore, the meaning of ROLLUP has
gone beyond the scope of "roll-up". That's why we named it Materialized Index
in the source code.
+* ROLLUP is attached to the Base table and can be seen as an auxiliary data
structure of the Base table. Users can create or delete ROLLUP based on the
Base table, but cannot explicitly specify a query for a ROLLUP in the query.
Whether ROLLUP is hit or not is entirely determined by the Doris system.
+* ROLLUP data is stored in separate physical storage. Therefore, the more
ROLLUP you create, the more disk space you occupy. It also has an impact on the
speed of import (the ETL phase of import automatically generates all ROLLUP
data), but it does not reduce query efficiency (only better).
+* Data updates for ROLLUP are fully synchronized with Base representations.
Users need not care about this problem.
+* Columns in ROLLUP are aggregated in exactly the same way as Base tables.
There is no need to specify or modify ROLLUP when creating it.
+* A necessary (inadequate) condition for a query to hit ROLLUP is that all
columns ** (including the query condition columns in select list and where)
involved in the query exist in the column of the ROLLUP. Otherwise, the query
can only hit the Base table.
+* Certain types of queries (such as count (*)) cannot hit ROLLUP under any
conditions. See the next section **Limitations of the aggregation model**.
+* The query execution plan can be obtained by `EXPLAIN your_sql;` command, and
in the execution plan, whether ROLLUP has been hit or not can be checked.
+* Base tables and all created ROLLUP can be displayed by `DESC tbl_name ALL;`
statement.
+
+## Query
+
As a polymer view in Doris, Rollup can play two roles in queries:
* Index
@@ -33,13 +183,7 @@ As a polymer view in Doris, Rollup can play two roles in
queries:
However, in order to hit Rollup, certain conditions need to be met, and the
value of PreAggregation of ScanNdo node in the execution plan can be used to
determine whether Rollup can be hit or not, and the Rollup field can be used to
determine which Rollup table is hit.
-## Noun Interpretation
-
-Base: Base table.
-
-Rollup: Generally, it refers to the Rollup tables created based on Base
tables, but in some scenarios, it includes Base and Rollup tables.
-
-## Index
+### Index
Doris's prefix index has been introduced in the previous query practice, that
is, Doris will generate the first 36 bytes in the Base/Rollup table separately
in the underlying storage engine (with varchar type, the prefix index may be
less than 36 bytes, varchar will truncate the prefix index, and use up to 20
bytes of varchar). A sorted sparse index data (data is also sorted, positioned
by index, and then searched by dichotomy in the data), and then matched each
Base/Rollup prefix index [...]
@@ -133,13 +277,13 @@ rollup_index4(k4, k6, k5, k1, k2, k3, k7)
Conditions on columns that can be indexed with the prefix need to be `=` `<`
`>` `<=` `>=` `in` `between`, and these conditions are side-by-side and the
relationship uses `and` connections', which cannot be hit for `or`、`!=` and so
on. Then look at the following query:
-```
+```sql
SELECT * FROM test WHERE k1 = 1 AND k2 > 3;
```
With the conditions on K1 and k2, check that only the first column of Base
contains K1 in the condition, so match the longest prefix index, test, explain:
-```
+```text
| 0:OlapScanNode
| TABLE: test
| PREAGGREGATION: OFF. Reason: No AggregateInfo
@@ -151,11 +295,13 @@ With the conditions on K1 and k2, check that only the
first column of Base conta
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
-```
+```
Look again at the following queries:
-`SELECT * FROM test WHERE k4 =1 AND k5 > 3;`
+```sql
+SELECT * FROM test WHERE k4 = 1 AND k5 > 3;
+```
With K4 and K5 conditions, check that the first column of rollup_index3 and
rollup_index4 contains k4, but the second column of rollup_index3 contains k5,
so the matching prefix index is the longest.
@@ -175,7 +321,9 @@ With K4 and K5 conditions, check that the first column of
rollup_index3 and roll
Now we try to match the conditions on the column containing varchar, as
follows:
-`SELECT * FROM test WHERE k9 IN ("xxx", "yyyy") AND k1 = 10;`
+```sql
+SELECT * FROM test WHERE k9 IN ("xxx", "yyyy") AND k1 = 10;
+```
There are K9 and K1 conditions. The first column of rollup_index1 and
rollup_index2 contains k9. It is reasonable to choose either rollup here to hit
the prefix index and randomly select the same one (because there are just 20
bytes in varchar, and the prefix index is truncated in less than 36 bytes). The
current strategy here will continue to match k1, because the second
rollup_index1 is listed as k1, so rollup_index1 is chosen, in fact, the latter
K1 condition will not play an accelera [...]
@@ -191,11 +339,13 @@ There are K9 and K1 conditions. The first column of
rollup_index1 and rollup_ind
| avgRowSize=0.0
| numNodes=0
| tuple ids: 0
-```
+```
Finally, look at a query that can be hit by more than one Rollup:
-`Select * from test where K4 < 1000 and K5 = 80 and K6 = 10000;`
+```sql
+SELECT * FROM test WHERE K4 < 1000 AND K5 = 80 AND K6 = 10000;
+```
There are three conditions: k4, K5 and k6. The first three columns of
rollup_index3 and rollup_index4 contain these three columns respectively. So
the prefix index length matched by them is the same. Both can be selected. The
current default strategy is to select a rollup created earlier. Here is
rollup_index3.
@@ -215,11 +365,13 @@ There are three conditions: k4, K5 and k6. The first
three columns of rollup_ind
If you modify the above query slightly as follows:
-`SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 OR k6 >= 10000;`
+```sql
+SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 OR k6 >= 10000;
+```
The query here cannot hit the prefix index. (Even any Min/Max in the Doris
storage engine, the BloomFilter index doesn't work.)
-## Aggregate data
+### Aggregate data
Of course, the function of aggregated data is indispensable for general
polymer views. Such materialized views are very helpful for aggregated queries
or report queries. To hit the polymer views, the following prerequisites are
needed:
@@ -279,7 +431,9 @@ The following Base table and Rollup:
See the following queries:
-`SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in
(1,2,3);`
+```sql
+SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);
+```
Firstly, it judges whether the query can hit the aggregated Rollup table.
After checking the graph above, it is possible. Then the condition contains
three conditions: k1, K2 and k3. The first three columns of test_rollup,
rollup1 and rollup2 contain all the three conditions. So the prefix index
length is the same. Then, it is obvious that the aggregation degree of rollup2
is the highest when comparing the number of rows. Row 2 is selected because of
the minimum number of rows.
diff --git a/new-docs/zh-CN/data-table/advance-usage.md
b/new-docs/zh-CN/data-table/advance-usage.md
index 3755499dd0..4b23c1aed1 100644
--- a/new-docs/zh-CN/data-table/advance-usage.md
+++ b/new-docs/zh-CN/data-table/advance-usage.md
@@ -24,4 +24,271 @@ specific language governing permissions and limitations
under the License.
-->
-# 高级使用指南
\ No newline at end of file
+# 高级使用指南
+
+本文我们来介绍下Doris的一些高级特性。
+
+## 表结构变更
+
+使用 [ALTER TABLE
COLUMN](../sql-manual/sql-reference-v2/Data-Definition-Statements/Alter/ALTER-TABLE-COLUMN.html)
命令可以修改表的 Schema,包括如下修改:
+
+- 增加列
+- 删除列
+- 修改列类型
+- 改变列顺序
+
+以下通过使用示例说明表结构变更:
+
+原表 table1 的 Schema 如下:
+
+```text
++----------+-------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++----------+-------------+------+-------+---------+-------+
+| siteid | int(11) | No | true | 10 | |
+| citycode | smallint(6) | No | true | N/A | |
+| username | varchar(32) | No | true | | |
+| pv | bigint(20) | No | false | 0 | SUM |
++----------+-------------+------+-------+---------+-------+
+```
+
+我们新增一列 uv,类型为 BIGINT,聚合类型为 SUM,默认值为 0:
+
+```sql
+ALTER TABLE table1 ADD COLUMN uv BIGINT SUM DEFAULT '0' after pv;
+```
+
+提交成功后,可以通过以下命令查看作业进度:
+
+```sql
+SHOW ALTER TABLE COLUMN;
+```
+
+当作业状态为 `FINISHED`,则表示作业完成。新的 Schema 已生效。
+
+ALTER TABLE 完成之后, 可以通过 `DESC TABLE` 查看最新的 Schema。
+
+```sql
+mysql> DESC table1;
++----------+-------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++----------+-------------+------+-------+---------+-------+
+| siteid | int(11) | No | true | 10 | |
+| citycode | smallint(6) | No | true | N/A | |
+| username | varchar(32) | No | true | | |
+| pv | bigint(20) | No | false | 0 | SUM |
+| uv | bigint(20) | No | false | 0 | SUM |
++----------+-------------+------+-------+---------+-------+
+5 rows in set (0.00 sec)
+```
+
+可以使用以下命令取消当前正在执行的作业:
+
+```sql
+CANCEL ALTER TABLE COLUMN FROM table1;
+```
+
+更多帮助,可以参阅 `HELP ALTER TABLE`。
+
+## Rollup
+
+Rollup 可以理解为 Table 的一个物化索引结构。**物化** 是因为其数据在物理上独立存储,而 **索引**
的意思是,Rollup可以调整列顺序以增加前缀索引的命中率,也可以减少key列以增加数据的聚合度。
+
+使用[ALTER TABLE
ROLLUP](../sql-manual/sql-reference-v2/Data-Definition-Statements/Alter/ALTER-TABLE-ROLLUP.html)可以进行Rollup的各种变更操作。
+
+以下举例说明
+
+原表table1的Schema如下:
+
+```text
++----------+-------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++----------+-------------+------+-------+---------+-------+
+| siteid | int(11) | No | true | 10 | |
+| citycode | smallint(6) | No | true | N/A | |
+| username | varchar(32) | No | true | | |
+| pv | bigint(20) | No | false | 0 | SUM |
+| uv | bigint(20) | No | false | 0 | SUM |`
++----------+-------------+------+-------+---------+-------+
+```
+
+对于 table1 明细数据是 siteid, citycode, username 三者构成一组 key,从而对 pv
字段进行聚合;如果业务方经常有看城市 pv 总量的需求,可以建立一个只有 citycode, pv 的rollup。
+
+```sql
+ALTER TABLE table1 ADD ROLLUP rollup_city(citycode, pv);
+```
+
+提交成功后,可以通过以下命令查看作业进度:
+
+```sql
+SHOW ALTER TABLE ROLLUP;
+```
+
+当作业状态为 `FINISHED`,则表示作业完成。
+
+Rollup 建立完成之后可以使用 `DESC table1 ALL` 查看表的 Rollup 信息。
+
+```sql
+mysql> desc table1 all;
++-------------+----------+-------------+------+-------+--------+-------+
+| IndexName | Field | Type | Null | Key | Default | Extra |
++-------------+----------+-------------+------+-------+---------+-------+
+| table1 | siteid | int(11) | No | true | 10 | |
+| | citycode | smallint(6) | No | true | N/A | |
+| | username | varchar(32) | No | true | | |
+| | pv | bigint(20) | No | false | 0 | SUM |
+| | uv | bigint(20) | No | false | 0 | SUM |
+| | | | | | | |
+| rollup_city | citycode | smallint(6) | No | true | N/A | |
+| | pv | bigint(20) | No | false | 0 | SUM |
++-------------+----------+-------------+------+-------+---------+-------+
+8 rows in set (0.01 sec)
+```
+
+可以使用以下命令取消当前正在执行的作业:
+
+```sql
+CANCEL ALTER TABLE ROLLUP FROM table1;
+```
+
+Rollup 建立之后,查询不需要指定 Rollup 进行查询。还是指定原有表进行查询即可。程序会自动判断是否应该使用 Rollup。是否命中
Rollup可以通过 `EXPLAIN your_sql;` 命令进行查看。
+
+更多帮助,可以参阅 `HELP ALTER TABLE`。
+
+## 数据表的查询
+
+### 内存限制
+
+为了防止用户的一个查询可能因为消耗内存过大。查询进行了内存控制,一个查询任务,在单个 BE 节点上默认使用不超过 2GB 内存。
+
+用户在使用时,如果发现报 `Memory limit exceeded` 错误,一般是超过内存限制了。
+
+遇到内存超限时,用户应该尽量通过优化自己的 sql 语句来解决。
+
+如果确切发现2GB内存不能满足,可以手动设置内存参数。
+
+显示查询内存限制:
+
+```sql
+mysql> SHOW VARIABLES LIKE "%mem_limit%";
++---------------+------------+
+| Variable_name | Value |
++---------------+------------+
+| exec_mem_limit| 2147483648 |
++---------------+------------+
+1 row in set (0.00 sec)
+```
+
+`exec_mem_limit` 的单位是 byte,可以通过 `SET` 命令改变 `exec_mem_limit` 的值。如改为 8GB。
+
+```sql
+mysql> SET exec_mem_limit = 8589934592;
+Query OK, 0 rows affected (0.00 sec)
+mysql> SHOW VARIABLES LIKE "%mem_limit%";
++---------------+------------+
+| Variable_name | Value |
++---------------+------------+
+| exec_mem_limit| 8589934592 |
++---------------+------------+
+1 row in set (0.00 sec)
+```
+
+> - 以上该修改为 session 级别,仅在当前连接 session 内有效。断开重连则会变回默认值。
+> - 如果需要修改全局变量,可以这样设置:`SET GLOBAL exec_mem_limit = 8589934592;`。设置完成后,断开
session 重新登录,参数将永久生效。
+
+### 查询超时
+
+当前默认查询时间设置为最长为 300 秒,如果一个查询在 300 秒内没有完成,则查询会被 Doris 系统 cancel
掉。用户可以通过这个参数来定制自己应用的超时时间,实现类似 wait(timeout) 的阻塞方式。
+
+查看当前超时设置:
+
+```sql
+mysql> SHOW VARIABLES LIKE "%query_timeout%";
++---------------+-------+
+| Variable_name | Value |
++---------------+-------+
+| QUERY_TIMEOUT | 300 |
++---------------+-------+
+1 row in set (0.00 sec)
+```
+
+修改超时时间到1分钟:
+
+```sql
+mysql> SET query_timeout = 60;
+Query OK, 0 rows affected (0.00 sec)
+```
+
+> - 当前超时的检查间隔为 5 秒,所以小于 5 秒的超时不会太准确。
+> - 以上修改同样为 session 级别。可以通过 `SET GLOBAL` 修改全局有效。
+
+### Broadcast/Shuffle Join
+
+系统默认实现 Join 的方式,是将小表进行条件过滤后,将其广播到大表所在的各个节点上,形成一个内存 Hash 表,然后流式读出大表的数据进行Hash
Join。但是如果当小表过滤后的数据量无法放入内存的话,此时 Join 将无法完成,通常的报错应该是首先造成内存超限。
+
+如果遇到上述情况,建议显式指定 Shuffle Join,也被称作 Partitioned Join。即将小表和大表都按照 Join 的 key 进行
Hash,然后进行分布式的 Join。这个对内存的消耗就会分摊到集群的所有计算节点上。
+
+Doris会自动尝试进行 Broadcast Join,如果预估小表过大则会自动切换至 Shuffle Join。注意,如果此时显式指定了
Broadcast Join 也会自动切换至 Shuffle Join。
+
+使用 Broadcast Join(默认):
+
+```sql
+mysql> select sum(table1.pv) from table1 join table2 where table1.siteid = 2;
++--------------------+
+| sum(`table1`.`pv`) |
++--------------------+
+| 10 |
++--------------------+
+1 row in set (0.20 sec)
+```
+
+使用 Broadcast Join(显式指定):
+
+```sql
+mysql> select sum(table1.pv) from table1 join [broadcast] table2 where
table1.siteid = 2;
++--------------------+
+| sum(`table1`.`pv`) |
++--------------------+
+| 10 |
++--------------------+
+1 row in set (0.20 sec)
+```
+
+使用 Shuffle Join:
+
+```sql
+mysql> select sum(table1.pv) from table1 join [shuffle] table2 where
table1.siteid = 2;
++--------------------+
+| sum(`table1`.`pv`) |
++--------------------+
+| 10 |
++--------------------+
+1 row in set (0.15 sec)
+```
+
+### 查询重试和高可用
+
+当部署多个 FE 节点时,用户可以在多个 FE 之上部署负载均衡层来实现 Doris 的高可用。
+
+以下提供一些高可用的方案:
+
+**第一种**
+
+自己在应用层代码进行重试和负载均衡。比如发现一个连接挂掉,就自动在其他连接上进行重试。应用层代码重试需要应用自己配置多个doris前端节点地址。
+
+**第二种**
+
+如果使用 mysql jdbc connector 来连接Doris,可以使用 jdbc 的自动重试机制:
+
+```text
+jdbc:mysql://[host1][:port1],[host2][:port2][,[host3][:port3]]...[/[database]][?propertyName1=propertyValue1[&propertyName2=propertyValue2]...]
+```
+
+**第三种**
+
+应用可以连接到和应用部署到同一机器上的 MySQL Proxy,通过配置 MySQL Proxy 的 Failover 和 Load Balance
功能来达到目的。
+
+```
+https://dev.mysql.com/doc/refman/5.6/en/proxy-users.html
+```
+
diff --git a/new-docs/zh-CN/data-table/basic-usage.md
b/new-docs/zh-CN/data-table/basic-usage.md
index 8ed57cb1fe..dd6601339f 100644
--- a/new-docs/zh-CN/data-table/basic-usage.md
+++ b/new-docs/zh-CN/data-table/basic-usage.md
@@ -24,4 +24,395 @@ specific language governing permissions and limitations
under the License.
-->
-# 基础使用指南
\ No newline at end of file
+# 基础使用指南
+
+Doris 采用 MySQL 协议进行通信,用户可通过 MySQL client 或者 MySQL JDBC连接到 Doris 集群。选择 MySQL
client 版本时建议采用5.1 之后的版本,因为 5.1 之前不能支持长度超过 16 个字符的用户名。本文以 MySQL client
为例,通过一个完整的流程向用户展示 Doris 的基本使用方法。
+
+## 创建用户
+
+### Root用户登录与密码修改
+
+Doris 内置 root 和 admin 用户,密码默认都为空。启动完 Doris 程序之后,可以通过 root 或 admin 用户连接到 Doris
集群。 使用下面命令即可登录 Doris,登录后进入到Doris对应的Mysql命令行操作界面:
+
+```bash
+[root@doris ~]# mysql -h FE_HOST -P9030 -uroot
+Welcome to the MySQL monitor. Commands end with ; or \g.
+Your MySQL connection id is 41
+Server version: 5.1.0 Doris version 1.0.0-preview2-b48ee2734
+
+Copyright (c) 2000, 2022, Oracle and/or its affiliates.
+
+Oracle is a registered trademark of Oracle Corporation and/or its
+affiliates. Other names may be trademarks of their respective
+owners.
+
+Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
+
+mysql>
+```
+
+> `FE_HOST` 是任一FE节点的IP地址,`9030` 是fe.conf 中的 query_port 配置;
+
+登录后,可以通过以下命令修改root密码:
+
+```mysql
+mysql> SET PASSWORD FOR 'root' = PASSWORD('your_password');
+Query OK, 0 rows affected (0.00 sec)
+```
+
+> `your_password`是为`root`用户设置的新密码,可以随意设置,建议设置为强密码增加安全性,下次登录就用新密码登录。
+
+### 创建新用户
+
+我们可以通过下面的命令创建一个普通用户`test`:
+
+```bash
+mysql> CREATE USER 'test' IDENTIFIED BY 'test_passwd';
+Query OK, 0 rows affected (0.00 sec)
+```
+
+后续登录时就可以通过下面链接命令登录:
+
+```bash
+[root@doris ~]# mysql -h FE_HOST -P9030 -utest -ptest_passwd
+```
+
+> 注意:新创建的普通用户默认没有任何权限,权限授予可以参考后面的权限授予。
+
+## 数据表的创建与数据导入
+
+### 创建数据库
+
+初始可以通过 root 或 admin 用户创建数据库:
+
+```sql
+CREATE DATABASE example_db;
+```
+
+> 所有命令都可以使用 `HELP command;` 查看到详细的语法帮助,如:`HELP CREATE DATABASE;`。也可以查阅官网 [SHOW
CREATE
DATABASE](../sql-manual/sql-reference-v2/Show-Statements/SHOW-CREATE-DATABASE.html)
命令手册。
+>
+> 如果不清楚命令的全名,可以使用 "help 命令某一字段" 进行模糊查询。如键入 'HELP CREATE',可以匹配到 `CREATE
DATABASE`, `CREATE TABLE`, `CREATE USER` 等命令。
+>
+> ```sql
+> mysql> HELP CREATE;
+> Many help items for your request exist.
+> To make a more specific request, please type 'help <item>',
+> where <item> is one of the following
+> topics:
+> CREATE CLUSTER
+> CREATE DATABASE
+> CREATE ENCRYPTKEY
+> CREATE FILE
+> CREATE FUNCTION
+> CREATE INDEX
+> CREATE MATERIALIZED VIEW
+> CREATE REPOSITORY
+> CREATE RESOURCE
+> CREATE ROLE
+> CREATE SYNC JOB
+> CREATE TABLE
+> CREATE USER
+> CREATE VIEW
+> ROUTINE LOAD
+> SHOW CREATE FUNCTION
+> SHOW CREATE ROUTINE LOAD
+> ```
+
+数据库创建完成之后,可以通过 [SHOW
DATABASES](../sql-manual/sql-reference-v2/Show-Statements/SHOW-DATABASES.html#show-databases)
查看数据库信息。
+
+```sql
+mysql> SHOW DATABASES;
++--------------------+
+| Database |
++--------------------+
+| example_db |
+| information_schema |
++--------------------+
+2 rows in set (0.00 sec)
+```
+
+>information_schema数据库是为了兼容MySQL协议而存在,实际中信息可能不是很准确,所以关于具体数据库的信息建议通过直接查询相应数据库而获得。
+
+### 账户授权
+
+example_db 创建完成之后,可以通过 root/admin
账户使用[GRANT](../sql-manual/sql-reference-v2/Account-Management-Statements/GRANT.html)命令将
example_db 读写权限授权给普通账户,如 test。授权之后采用 test 账户登录就可以操作 example_db 数据库了。
+
+```sql
+mysql> GRANT ALL ON example_db TO test;
+Query OK, 0 rows affected (0.01 sec)
+```
+
+### 建表
+
+使用 [CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html)
命令建立一个表(Table)。更多详细参数可以 `HELP CREATE TABLE;`
+
+首先,我们需要使用[USE](../sql-manual/sql-reference-v2/Utility-Statements/USE.html)命令来切换数据库:
+
+```sql
+mysql> USE example_db;
+Database changed
+```
+
+Doris支持[复合分区和单分区](data-partition.html#复合分区与单分区)两种建表方式。下面以聚合模型为例,分别演示如何创建两种分区的数据表。
+
+#### 单分区
+
+建立一个名字为 table1 的逻辑表。分桶列为 siteid,桶数为 10。
+
+这个表的 schema 如下:
+
+- siteid:类型是INT(4字节), 默认值为10
+- citycode:类型是SMALLINT(2字节)
+- username:类型是VARCHAR, 最大长度为32, 默认值为空字符串
+- pv:类型是BIGINT(8字节), 默认值是0; 这是一个指标列, Doris内部会对指标列做聚合操作, 这个列的聚合方法是求和(SUM)
+
+建表语句如下:
+
+```sql
+CREATE TABLE table1
+(
+ siteid INT DEFAULT '10',
+ citycode SMALLINT,
+ username VARCHAR(32) DEFAULT '',
+ pv BIGINT SUM DEFAULT '0'
+)
+AGGREGATE KEY(siteid, citycode, username)
+DISTRIBUTED BY HASH(siteid) BUCKETS 10
+PROPERTIES("replication_num" = "1");
+```
+
+#### 复合分区
+
+建立一个名字为 table2 的逻辑表。
+
+这个表的 schema 如下:
+
+- event_day:类型是DATE,无默认值
+- siteid:类型是INT(4字节), 默认值为10
+- citycode:类型是SMALLINT(2字节)
+- username:类型是VARCHAR, 最大长度为32, 默认值为空字符串
+- pv:类型是BIGINT(8字节), 默认值是0; 这是一个指标列, Doris 内部会对指标列做聚合操作, 这个列的聚合方法是求和(SUM)
+
+我们使用 event_day 列作为分区列,建立3个分区: p201706, p201707, p201708
+
+- p201706:范围为 [最小值, 2017-07-01)
+- p201707:范围为 [2017-07-01, 2017-08-01)
+- p201708:范围为 [2017-08-01, 2017-09-01)
+
+> 注意区间为左闭右开。
+
+每个分区使用 siteid 进行哈希分桶,桶数为10
+
+建表语句如下:
+
+```sql
+CREATE TABLE table2
+(
+ event_day DATE,
+ siteid INT DEFAULT '10',
+ citycode SMALLINT,
+ username VARCHAR(32) DEFAULT '',
+ pv BIGINT SUM DEFAULT '0'
+)
+AGGREGATE KEY(event_day, siteid, citycode, username)
+PARTITION BY RANGE(event_day)
+(
+ PARTITION p201706 VALUES LESS THAN ('2017-07-01'),
+ PARTITION p201707 VALUES LESS THAN ('2017-08-01'),
+ PARTITION p201708 VALUES LESS THAN ('2017-09-01')
+)
+DISTRIBUTED BY HASH(siteid) BUCKETS 10
+PROPERTIES("replication_num" = "1");
+```
+
+数据表创建完成后,可以查看 example_db 中表的信息:
+
+```sql
+mysql> SHOW TABLES;
++----------------------+
+| Tables_in_example_db |
++----------------------+
+| table1 |
+| table2 |
++----------------------+
+2 rows in set (0.01 sec)
+
+mysql> DESC table1;
++----------+-------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++----------+-------------+------+-------+---------+-------+
+| siteid | int(11) | Yes | true | 10 | |
+| citycode | smallint(6) | Yes | true | N/A | |
+| username | varchar(32) | Yes | true | | |
+| pv | bigint(20) | Yes | false | 0 | SUM |
++----------+-------------+------+-------+---------+-------+
+4 rows in set (0.00 sec)
+
+mysql> DESC table2;
++-----------+-------------+------+-------+---------+-------+
+| Field | Type | Null | Key | Default | Extra |
++-----------+-------------+------+-------+---------+-------+
+| event_day | date | Yes | true | N/A | |
+| siteid | int(11) | Yes | true | 10 | |
+| citycode | smallint(6) | Yes | true | N/A | |
+| username | varchar(32) | Yes | true | | |
+| pv | bigint(20) | Yes | false | 0 | SUM |
++-----------+-------------+------+-------+---------+-------+
+5 rows in set (0.00 sec)
+```
+
+>注意事项:
+
+> 1. 上述表通过设置 replication_num 建的都是单副本的表,Doris建议用户采用默认的 3 副本设置,以保证高可用。
+> 2. 可以对复合分区表动态的增删分区,详见 `HELP ALTER TABLE;` 中 Partition 相关部分。
+> 3. 数据导入可以导入指定的 Partition,详见 `HELP LOAD;`。
+> 4. 可以动态修改表的 Schema,详见`HELP ALTER TABLE;`。
+> 5. 可以对 Table 增加上卷表(Rollup)以提高查询性能,这部分可以参见高级使用指南关于 Rollup 的描述。
+> 6. 表的列的Null属性默认为true,会对查询性能有一定的影响。
+
+### 导入数据
+
+Doris
支持多种数据导入方式。具体可以参阅[数据导入](../data-operate/import/load-manual.html)文档。这里我们使用流式导入和
Broker 导入做示例。
+
+#### 流式导入
+
+流式导入通过 HTTP 协议向 Doris 传输数据,可以不依赖其他系统或组件直接导入本地数据。详细语法帮助可以参阅 `HELP STREAM LOAD;`。
+
+示例1:以 "table1_20170707" 为 Label,使用本地文件 table1_data 导入 table1 表。
+
+```bash
+curl --location-trusted -u test:test_passwd -H "label:table1_20170707" -H
"column_separator:," -T table1_data
http://FE_HOST:8030/api/example_db/table1/_stream_load
+```
+
+> 1. FE_HOST 是任一 FE 所在节点 IP,8030 为 fe.conf 中的 http_port。
+> 2. 可以使用任一 BE 的 IP,以及 be.conf 中的 webserver_port 进行导入。如:`BE_HOST:8040`
+
+本地文件 `table1_data` 以 `,` 作为数据之间的分隔,具体内容如下:
+
+```text
+1,1,jim,2
+2,1,grace,2
+3,2,tom,2
+4,3,bush,3
+5,3,helen,3
+```
+
+示例2: 以 "table2_20170707" 为 Label,使用本地文件 table2_data 导入 table2 表。
+
+```bash
+curl --location-trusted -u test:test -H "label:table2_20170707" -H
"column_separator:|" -T table2_data
http://127.0.0.1:8030/api/example_db/table2/_stream_load
+```
+
+本地文件 `table2_data` 以 `|` 作为数据之间的分隔,具体内容如下:
+
+```text
+2017-07-03|1|1|jim|2
+2017-07-05|2|1|grace|2
+2017-07-12|3|2|tom|2
+2017-07-15|4|3|bush|3
+2017-07-12|5|3|helen|3
+```
+
+> 注意事项:
+>
+> 1. 采用流式导入建议文件大小限制在 10GB 以内,过大的文件会导致失败重试代价变大。
+> 2. 每一批导入数据都需要取一个 Label,Label 最好是一个和一批数据有关的字符串,方便阅读和管理。Doris 基于 Label
保证在一个Database 内,同一批数据只可导入成功一次。失败任务的 Label 可以重用。
+> 3. 流式导入是同步命令。命令返回成功则表示数据已经导入,返回失败表示这批数据没有导入。
+
+#### Broker 导入
+
+Broker 导入通过部署的 Broker 进程,读取外部存储上的数据进行导入。更多帮助请参阅 `HELP BROKER LOAD;`
+
+示例:以 "table1_20170708" 为 Label,将 HDFS 上的文件导入 table1 表
+
+```sql
+LOAD LABEL table1_20170708
+(
+ DATA INFILE("hdfs://your.namenode.host:port/dir/table1_data")
+ INTO TABLE table1
+)
+WITH BROKER hdfs
+(
+ "username"="hdfs_user",
+ "password"="hdfs_password"
+)
+PROPERTIES
+(
+ "timeout"="3600",
+ "max_filter_ratio"="0.1"
+);
+```
+
+Broker 导入是异步命令。以上命令执行成功只表示提交任务成功。导入是否成功需要通过 `SHOW LOAD;` 查看。如:
+
+```sql
+SHOW LOAD WHERE LABEL = "table1_20170708";
+```
+
+返回结果中,`State` 字段为 `FINISHED` 则表示导入成功。
+
+关于 `SHOW LOAD` 的更多说明,可以参阅 `HELP SHOW LOAD;`
+
+异步的导入任务在结束前可以取消:
+
+```sql
+CANCEL LOAD WHERE LABEL = "table1_20170708";
+```
+
+## 数据的查询
+
+### 简单查询
+
+查询示例:
+
+```sql
+mysql> SELECT * FROM table1 LIMIT 3;
++--------+----------+----------+------+
+| siteid | citycode | username | pv |
++--------+----------+----------+------+
+| 2 | 1 | 'grace' | 2 |
+| 5 | 3 | 'helen' | 3 |
+| 3 | 2 | 'tom' | 2 |
++--------+----------+----------+------+
+3 rows in set (0.01 sec)
+
+mysql> SELECT * FROM table1 ORDER BY citycode;
++--------+----------+----------+------+
+| siteid | citycode | username | pv |
++--------+----------+----------+------+
+| 2 | 1 | 'grace' | 2 |
+| 1 | 1 | 'jim' | 2 |
+| 3 | 2 | 'tom' | 2 |
+| 4 | 3 | 'bush' | 3 |
+| 5 | 3 | 'helen' | 3 |
++--------+----------+----------+------+
+5 rows in set (0.01 sec)
+```
+
+### Join 查询
+
+查询示例:
+
+```sql
+mysql> SELECT SUM(table1.pv) FROM table1 JOIN table2 WHERE table1.siteid =
table2.siteid;
++--------------------+
+| sum(`table1`.`pv`) |
++--------------------+
+| 12 |
++--------------------+
+1 row in set (0.20 sec)
+```
+
+### 子查询
+
+查询示例:
+
+```sql
+mysql> SELECT SUM(pv) FROM table2 WHERE siteid IN (SELECT siteid FROM table1
WHERE siteid > 2);
++-----------+
+| sum(`pv`) |
++-----------+
+| 8 |
++-----------+
+1 row in set (0.13 sec)
+```
\ No newline at end of file
diff --git a/new-docs/zh-CN/data-table/data-partition.md
b/new-docs/zh-CN/data-table/data-partition.md
index a397bb1e08..d88307b245 100644
--- a/new-docs/zh-CN/data-table/data-partition.md
+++ b/new-docs/zh-CN/data-table/data-partition.md
@@ -24,4 +24,381 @@ specific language governing permissions and limitations
under the License.
-->
-# 数据划分
\ No newline at end of file
+# 数据划分
+
+本文档主要介绍 Doris 的建表和数据划分,以及建表操作中可能遇到的问题和解决方法。
+
+## 基本概念
+
+在 Doris 中,数据都以表(Table)的形式进行逻辑上的描述。
+
+### Row & Column
+
+一张表包括行(Row)和列(Column):
+
+- Row:即用户的一行数据;
+
+- Column: 用于描述一行数据中不同的字段。
+
+ Column 可以分为两大类:Key 和 Value。从业务角度看,Key 和 Value 可以分别对应维度列和指标列。从聚合模型的角度来说,Key
列相同的行,会聚合成一行。其中 Value 列的聚合方式由用户在建表时指定。关于更多聚合模型的介绍,可以参阅 [Doris
数据模型](data-model.html)。
+
+### Tablet & Partition
+
+在 Doris 的存储引擎中,用户数据被水平划分为若干个数据分片(Tablet,也称作数据分桶)。每个 Tablet 包含若干数据行。各个 Tablet
之间的数据没有交集,并且在物理上是独立存储的。
+
+多个 Tablet 在逻辑上归属于不同的分区(Partition)。一个 Tablet 只属于一个 Partition。而一个 Partition
包含若干个 Tablet。因为 Tablet 在物理上是独立存储的,所以可以视为 Partition 在物理上也是独立。Tablet
是数据移动、复制等操作的最小物理存储单元。
+
+若干个 Partition 组成一个 Table。Partition 可以视为是逻辑上最小的管理单元。数据的导入与删除,都可以或仅能针对一个
Partition 进行。
+
+## 数据划分
+
+我们以一个建表操作来说明 Doris 的数据划分。
+
+Doris 的建表是一个同步命令,SQL执行完成即返回结果,命令返回成功即表示建表成功。具体建表语法可以参考[CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html),也可以通过
`HELP CREATE TABLE;` 查看更多帮助。
+
+本小节通过一个例子,来介绍 Doris 的建表方式。
+
+```sql
+-- Range Partition
+
+CREATE TABLE IF NOT EXISTS example_db.expamle_range_tbl
+(
+ `user_id` LARGEINT NOT NULL COMMENT "用户id",
+ `date` DATE NOT NULL COMMENT "数据灌入日期时间",
+ `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
+ `city` VARCHAR(20) COMMENT "用户所在城市",
+ `age` SMALLINT COMMENT "用户年龄",
+ `sex` TINYINT COMMENT "用户性别",
+ `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT
"用户最后一次访问时间",
+ `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
+ `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
+ `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
+)
+ENGINE=OLAP
+AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
+PARTITION BY RANGE(`date`)
+(
+ PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
+ PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
+ PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
+)
+DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
+PROPERTIES
+(
+ "replication_num" = "3",
+ "storage_medium" = "SSD",
+ "storage_cooldown_time" = "2018-01-01 12:00:00"
+);
+
+
+-- List Partition
+
+CREATE TABLE IF NOT EXISTS example_db.expamle_list_tbl
+(
+ `user_id` LARGEINT NOT NULL COMMENT "用户id",
+ `date` DATE NOT NULL COMMENT "数据灌入日期时间",
+ `timestamp` DATETIME NOT NULL COMMENT "数据灌入的时间戳",
+ `city` VARCHAR(20) COMMENT "用户所在城市",
+ `age` SMALLINT COMMENT "用户年龄",
+ `sex` TINYINT COMMENT "用户性别",
+ `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT
"用户最后一次访问时间",
+ `cost` BIGINT SUM DEFAULT "0" COMMENT "用户总消费",
+ `max_dwell_time` INT MAX DEFAULT "0" COMMENT "用户最大停留时间",
+ `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "用户最小停留时间"
+)
+ENGINE=olap
+AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
+PARTITION BY LIST(`city`)
+(
+ PARTITION `p_cn` VALUES IN ("Beijing", "Shanghai", "Hong Kong"),
+ PARTITION `p_usa` VALUES IN ("New York", "San Francisco"),
+ PARTITION `p_jp` VALUES IN ("Tokyo")
+)
+DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
+PROPERTIES
+(
+ "replication_num" = "3",
+ "storage_medium" = "SSD",
+ "storage_cooldown_time" = "2018-01-01 12:00:00"
+);
+
+```
+
+### 列定义
+
+这里我们只以 AGGREGATE KEY 数据模型为例进行说明。更多数据模型参阅 [Doris 数据模型](./data-model.html)。
+
+列的基本类型,可以通过在 mysql-client 中执行 `HELP CREATE TABLE;` 查看。
+
+AGGREGATE KEY 数据模型中,所有没有指定聚合方式(SUM、REPLACE、MAX、MIN)的列视为 Key 列。而其余则为 Value 列。
+
+定义列时,可参照如下建议:
+
+1. Key 列必须在所有 Value 列之前。
+2. 尽量选择整型类型。因为整型类型的计算和查找效率远高于字符串。
+3. 对于不同长度的整型类型的选择原则,遵循 **够用即可**。
+4. 对于 VARCHAR 和 STRING 类型的长度,遵循 **够用即可**。
+5. 所有列的总字节长度(包括 Key 和 Value)不能超过 100KB。
+
+### 分区和分桶
+
+Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 List 的划分方式。第二层是 Bucket(Tablet),仅支持
Hash 的划分方式。
+
+也可以仅使用一层分区。使用一层分区时,只支持 Bucket 划分。下面我们来分别介绍下分区以及分桶:
+
+1. **Partition**
+
+ - Partition 列可以指定一列或多列,分区列必须为 KEY 列。多列分区的使用方式在后面 **多列分区** 小结介绍。
+ - 不论分区列是什么类型,在写分区值时,都需要加双引号。
+ - 分区数量理论上没有上限。
+ - 当不使用 Partition 建表时,系统会自动生成一个和表名同名的,全值范围的 Partition。该 Partition
对用户不可见,并且不可删改。
+ - 创建分区时**不可添加范围重叠**的分区。
+
+ **Range 分区**
+
+ - 分区列通常为时间列,以方便的管理新旧数据。
+
+ - Partition 支持通过 `VALUES LESS THAN (...)`
仅指定上界,系统会将前一个分区的上界作为该分区的下界,生成一个左闭右开的区间。通过,也支持通过 `VALUES [...)`
指定同时指定上下界,生成一个左闭右开的区间。
+
+ - 通过 `VALUES [...)` 同时指定上下界比较容易理解。这里举例说明,当使用 `VALUES LESS THAN (...)`
语句进行分区的增删操作时,分区范围的变化情况:
+
+ - 如上 `expamle_range_tbl` 示例,当建表完成后,会自动生成如下3个分区:
+
+ ```text
+ p201701: [MIN_VALUE, 2017-02-01)
+ p201702: [2017-02-01, 2017-03-01)
+ p201703: [2017-03-01, 2017-04-01)
+ ```
+
+ - 当我们增加一个分区 p201705 VALUES LESS THAN ("2017-06-01"),分区结果如下:
+
+ ```text
+ p201701: [MIN_VALUE, 2017-02-01)
+ p201702: [2017-02-01, 2017-03-01)
+ p201703: [2017-03-01, 2017-04-01)
+ p201705: [2017-04-01, 2017-06-01)
+ ```
+
+ - 此时我们删除分区 p201703,则分区结果如下:
+
+ ```text
+ p201701: [MIN_VALUE, 2017-02-01)
+ p201702: [2017-02-01, 2017-03-01)
+ p201705: [2017-04-01, 2017-06-01)
+ ```
+
+ > 注意到 p201702 和 p201705 的分区范围并没有发生变化,而这两个分区之间,出现了一个空洞:[2017-03-01,
2017-04-01)。即如果导入的数据范围在这个空洞范围内,是无法导入的。
+
+ - 继续删除分区 p201702,分区结果如下:
+
+ ```text
+ p201701: [MIN_VALUE, 2017-02-01)
+ p201705: [2017-04-01, 2017-06-01)
+ ```
+
+ > 空洞范围变为:[2017-02-01, 2017-04-01)
+
+ - 现在增加一个分区 p201702new VALUES LESS THAN ("2017-03-01"),分区结果如下:
+
+ ```text
+ p201701: [MIN_VALUE, 2017-02-01)
+ p201702new: [2017-02-01, 2017-03-01)
+ p201705: [2017-04-01, 2017-06-01)
+ ```
+
+ > 可以看到空洞范围缩小为:[2017-03-01, 2017-04-01)
+
+ - 现在删除分区 p201701,并添加分区 p201612 VALUES LESS THAN ("2017-01-01"),分区结果如下:
+
+ ```text
+ p201612: [MIN_VALUE, 2017-01-01)
+ p201702new: [2017-02-01, 2017-03-01)
+ p201705: [2017-04-01, 2017-06-01)
+ ```
+
+ > 即出现了一个新的空洞:[2017-01-01, 2017-02-01)
+
+ 综上,分区的删除不会改变已存在分区的范围。删除分区可能出现空洞。通过 `VALUES LESS THAN`
语句增加分区时,分区的下界紧接上一个分区的上界。
+
+ Range分区除了上述我们看到的单列分区,也支持**多列分区**,示例如下:
+
+ ```text
+ PARTITION BY RANGE(`date`, `id`)
+ (
+ PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
+ PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
+ PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
+ )
+ ```
+
+ 在以上示例中,我们指定 `date`(DATE 类型) 和 `id`(INT 类型) 作为分区列。以上示例最终得到的分区如下:
+
+ ```
+ * p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
+ * p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
+ * p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
+ ```
+
+ 注意,最后一个分区用户缺省只指定了 `date` 列的分区值,所以 `id` 列的分区值会默认填充
`MIN_VALUE`。当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
+
+ ```
+ * 数据 --> 分区
+ * 2017-01-01, 200 --> p201701_1000
+ * 2017-01-01, 2000 --> p201701_1000
+ * 2017-02-01, 100 --> p201701_1000
+ * 2017-02-01, 2000 --> p201702_2000
+ * 2017-02-15, 5000 --> p201702_2000
+ * 2017-03-01, 2000 --> p201703_all
+ * 2017-03-10, 1 --> p201703_all
+ * 2017-04-01, 1000 --> 无法导入
+ * 2017-05-01, 1000 --> 无法导入
+ ```
+
+ **List 分区**
+
+ - 分区列支持 `BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, LARGEINT, DATE, DATETIME,
CHAR, VARCHAR` 数据类型,分区值为枚举值。只有当数据为目标分区枚举值其中之一时,才可以命中分区。
+
+ - Partition 支持通过 `VALUES IN (...)` 来指定每个分区包含的枚举值。
+
+ - 下面通过示例说明,进行分区的增删操作时,分区的变化。
+
+ - 如上 `example_list_tbl` 示例,当建表完成后,会自动生成如下3个分区:
+
+ ```text
+ p_cn: ("Beijing", "Shanghai", "Hong Kong")
+ p_usa: ("New York", "San Francisco")
+ p_jp: ("Tokyo")
+ ```
+
+ - 当我们增加一个分区 p_uk VALUES IN ("London"),分区结果如下:
+
+ ```text
+ p_cn: ("Beijing", "Shanghai", "Hong Kong")
+ p_usa: ("New York", "San Francisco")
+ p_jp: ("Tokyo")
+ p_uk: ("London")
+ ```
+
+ - 当我们删除分区 p_jp,分区结果如下:
+
+ ```text
+ p_cn: ("Beijing", "Shanghai", "Hong Kong")
+ p_usa: ("New York", "San Francisco")
+ p_uk: ("London")
+ ```
+
+ List分区也支持**多列分区**,示例如下:
+
+ ```text
+ PARTITION BY LIST(`id`, `city`)
+ (
+ PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
+ PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
+ PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
+ )
+ ```
+
+ 在以上示例中,我们指定 `id`(INT 类型) 和 `city`(VARCHAR 类型) 作为分区列。以上示例最终得到的分区如下:
+
+ ```
+ * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
+ * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
+ * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
+ ```
+
+ 当用户插入数据时,分区列值会按照顺序依次比较,最终得到对应的分区。举例如下:
+
+ ```
+ * 数据 ---> 分区
+ * 1, Beijing ---> p1_city
+ * 1, Shanghai ---> p1_city
+ * 2, Shanghai ---> p2_city
+ * 3, Beijing ---> p3_city
+ * 1, Tianjin ---> 无法导入
+ * 4, Beijing ---> 无法导入
+ ```
+
+2. **Bucket**
+
+ - 如果使用了 Partition,则 `DISTRIBUTED ...` 语句描述的是数据在**各个分区内**的划分规则。如果不使用
Partition,则描述的是对整个表的数据的划分规则。
+ - 分桶列可以是多列,但必须为 Key 列。分桶列可以和 Partition 列相同或不同。
+ - 分桶列的选择,是在 **查询吞吐** 和 **查询并发** 之间的一种权衡:
+ 1.
如果选择多个分桶列,则数据分布更均匀。如果一个查询条件不包含所有分桶列的等值条件,那么该查询会触发所有分桶同时扫描,这样查询的吞吐会增加,单个查询的延迟随之降低。这个方式适合大吞吐低并发的查询场景。
+ 2.
如果仅选择一个或少数分桶列,则对应的点查询可以仅触发一个分桶扫描。此时,当多个点查询并发时,这些查询有较大的概率分别触发不同的分桶扫描,各个查询之间的IO影响较小(尤其当不同桶分布在不同磁盘上时),所以这种方式适合高并发的点查询场景。
+ - 分桶的数量理论上没有上限。
+
+3. **关于 Partition 和 Bucket 的数量和数据量的建议。**
+
+ - 一个表的 Tablet 总数量等于 (Partition num * Bucket num)。
+ - 一个表的 Tablet 数量,在不考虑扩容的情况下,推荐略多于整个集群的磁盘数量。
+ - 单个 Tablet 的数据量理论上没有上下界,但建议在 1G - 10G 的范围内。如果单个 Tablet
数据量过小,则数据的聚合效果不佳,且元数据管理压力大。如果数据量过大,则不利于副本的迁移、补齐,且会增加 Schema Change 或者 Rollup
操作失败重试的代价(这些操作失败重试的粒度是 Tablet)。
+ - 当 Tablet 的数据量原则和数量原则冲突时,建议优先考虑数据量原则。
+ - 在建表时,每个分区的 Bucket 数量统一指定。但是在动态增加分区时(`ADD PARTITION`),可以单独指定新分区的 Bucket
数量。可以利用这个功能方便的应对数据缩小或膨胀。
+ - 一个 Partition 的 Bucket 数量一旦指定,不可更改。所以在确定 Bucket 数量时,需要预先考虑集群扩容的情况。比如当前只有 3
台 host,每台 host 有 1 块盘。如果 Bucket 的数量只设置为 3 或更小,那么后期即使再增加机器,也不能提高并发度。
+ - 举一些例子:假设在有10台BE,每台BE一块磁盘的情况下。如果一个表总大小为
500MB,则可以考虑4-8个分片。5GB:8-16个分区。50GB:32个分区。500GB:建议分区,每个分区大小在 50GB
左右,每个分区16-32个分片。5TB:建议分区,每个分区大小在 50GB 左右,每个分区16-32个分片。
+
+ > 注:表的数据量可以通过 [`SHOW
DATA`](../sql-manual/sql-reference-v2/Show-Statements/SHOW-DATA.html)
命令查看,结果除以副本数,即表的数据量。
+
+#### 复合分区与单分区
+
+复合分区
+
+- 第一级称为 Partition,即分区。用户可以指定某一维度列作为分区列(当前只支持整型和时间类型的列),并指定每个分区的取值范围。
+- 第二级称为 Distribution,即分桶。用户可以指定一个或多个维度列以及桶数对数据进行 HASH 分布。
+
+以下场景推荐使用复合分区
+
+- 有时间维度或类似带有有序值的维度,可以以这类维度列作为分区列。分区粒度可以根据导入频次、分区数据量等进行评估。
+- 历史数据删除需求:如有删除历史数据的需求(比如仅保留最近N 天的数据)。使用复合分区,可以通过删除历史分区来达到目的。也可以通过在指定分区内发送
DELETE 语句进行数据删除。
+-
解决数据倾斜问题:每个分区可以单独指定分桶数量。如按天分区,当每天的数据量差异很大时,可以通过指定分区的分桶数,合理划分不同分区的数据,分桶列建议选择区分度大的列。
+
+用户也可以不使用复合分区,即使用单分区。则数据只做 HASH 分布。
+
+### PROPERTIES
+
+在建表语句的最后 PROPERTIES 中,关于PROPERTIES中可以设置的相关参数,我们可以查看[CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html)中查看详细的介绍。
+
+### ENGIN
+
+本示例中,ENGINE 的类型是 olap,即默认的 ENGINE 类型。在 Doris 中,只有这个 ENGINE 类型是由 Doris
负责数据管理和存储的。其他 ENGINE 类型,如 mysql、broker、es 等等,本质上只是对外部其他数据库或系统中的表的映射,以保证 Doris
可以读取这些数据。而 Doris 本身并不创建、管理和存储任何非 olap ENGINE 类型的表和数据。
+
+### 其他
+
+```text
+`IF NOT EXISTS`
表示如果没有创建过该表,则创建。注意这里只判断表名是否存在,而不会判断新建表结构是否与已存在的表结构相同。所以如果存在一个同名但不同构的表,该命令也会返回成功,但并不代表已经创建了新的表和新的结构。
+```
+
+## 常见问题
+
+### 建表操作常见问题
+
+1. 如果在较长的建表语句中出现语法错误,可能会出现语法错误提示不全的现象。这里罗列可能的语法错误供手动纠错:
+
+ - 语法结构错误。请仔细阅读 `HELP CREATE TABLE;`,检查相关语法结构。
+ - 保留字。当用户自定义名称遇到保留字时,需要用反引号 `` 引起来。建议所有自定义名称使用这个符号引起来。
+ - 中文字符或全角字符。非 utf8 编码的中文字符,或隐藏的全角字符(空格,标点等)会导致语法错误。建议使用带有显示不可见字符的文本编辑器进行检查。
+
+2. `Failed to create partition [xxx] . Timeout`
+
+ Doris 建表是按照 Partition 粒度依次创建的。当一个 Partition 创建失败时,可能会报这个错误。即使不使用
Partition,当建表出现问题时,也会报 `Failed to create partition`,因为如前文所述,Doris 会为没有指定
Partition 的表创建一个不可更改的默认的 Partition。
+
+ 当遇到这个错误是,通常是 BE 在创建数据分片时遇到了问题。可以参照以下步骤排查:
+
+ - 在 fe.log 中,查找对应时间点的 `Failed to create partition` 日志。在该日志中,会出现一系列类似
`{10001-10010}` 字样的数字对。数字对的第一个数字表示 Backend ID,第二个数字表示 Tablet ID。如上这个数字对,表示 ID 为
10001 的 Backend 上,创建 ID 为 10010 的 Tablet 失败了。
+ - 前往对应 Backend 的 be.INFO 日志,查找对应时间段内,tablet id 相关的日志,可以找到错误信息。
+ - 以下罗列一些常见的 tablet 创建失败错误,包括但不限于:
+ - BE 没有收到相关 task,此时无法在 be.INFO 中找到 tablet id 相关日志或者 BE
创建成功,但汇报失败。以上问题,请参阅 [安装与部署](../install/install-deploy.html) 检查 FE 和 BE 的连通性。
+ - 预分配内存失败。可能是表中一行的字节长度超过了 100KB。
+ - `Too many open files`。打开的文件句柄数超过了 Linux 系统限制。需修改 Linux 系统的句柄数限制。
+
+ 如果创建数据分片时超时,也可以通过在 fe.conf 中设置 `tablet_create_timeout_second=xxx` 以及
`max_create_table_timeout_second=xxx` 来延长超时时间。其中 `tablet_create_timeout_second`
默认是1秒, `max_create_table_timeout_second`
默认是60秒,总体的超时时间为min(tablet_create_timeout_second * replication_num,
max_create_table_timeout_second),具体参数设置可参阅
[FE配置项](../admin-manual/config/fe-config.html) 。
+
+3. 建表命令长时间不返回结果。
+
+ Doris 的建表命令是同步命令。该命令的超时时间目前设置的比较简单,即(tablet num * replication
num)秒。如果创建较多的数据分片,并且其中有分片创建失败,则可能导致等待较长超时后,才会返回错误。
+
+ 正常情况下,建表语句会在几秒或十几秒内返回。如果超过一分钟,建议直接取消掉这个操作,前往 FE 或 BE 的日志查看相关错误。
+
+## 更多帮助
+
+关于数据划分更多的详细说明,我们可以在[CREATE
TABLE](../sql-manual/sql-reference-v2/Data-Definition-Statements/Create/CREATE-TABLE.html)命令手册中查阅,也可以在Mysql客户端下输入
`HELP CREATE TABLE;` 获取更多的帮助信息。
diff --git a/new-docs/zh-CN/data-table/hit-the-rollup.md
b/new-docs/zh-CN/data-table/hit-the-rollup.md
index b4dee45cc7..d398d72a5a 100644
--- a/new-docs/zh-CN/data-table/hit-the-rollup.md
+++ b/new-docs/zh-CN/data-table/hit-the-rollup.md
@@ -24,4 +24,429 @@ specific language governing permissions and limitations
under the License.
-->
-# Rollup 与查询
\ No newline at end of file
+# Rollup 与查询
+
+ROLLUP 在多维分析中是“上卷”的意思,即将数据按某种指定的粒度进行进一步聚合。
+
+## 基本概念
+
+在 Doris 中,我们将用户通过建表语句创建出来的表称为 Base 表(Base Table)。Base 表中保存着按用户建表语句指定的方式存储的基础数据。
+
+在 Base 表之上,我们可以创建任意多个 ROLLUP 表。这些 ROLLUP 的数据是基于 Base 表产生的,并且在物理上是**独立存储**的。
+
+ROLLUP 表的基本作用,在于在 Base 表的基础上,获得更粗粒度的聚合数据。
+
+下面我们用示例详细说明在不同数据模型中的 ROLLUP 表及其作用。
+
+### Aggregate 和 Unique 模型中的 ROLLUP
+
+因为 Unique 只是 Aggregate 模型的一个特例,所以这里我们不加以区别。
+
+1. 示例1:获得每个用户的总消费
+
+接 **[数据模型Aggregate 模型](data-model.html#Aggregate 模型)**小节的**示例2**,Base 表结构如下:
+
+| ColumnName | Type | AggregationType | Comment |
+| --------------- | ----------- | --------------- | ---------------------- |
+| user_id | LARGEINT | | 用户id |
+| date | DATE | | 数据灌入日期 |
+| timestamp | DATETIME | | 数据灌入时间,精确到秒 |
+| city | VARCHAR(20) | | 用户所在城市 |
+| age | SMALLINT | | 用户年龄 |
+| sex | TINYINT | | 用户性别 |
+| last_visit_date | DATETIME | REPLACE | 用户最后一次访问时间 |
+| cost | BIGINT | SUM | 用户总消费 |
+| max_dwell_time | INT | MAX | 用户最大停留时间 |
+| min_dwell_time | INT | MIN | 用户最小停留时间 |
+
+存储的数据如下:
+
+| user_id | date | timestamp | city | age | sex |
last_visit_date | cost | max_dwell_time | min_dwell_time |
+| ------- | ---------- | ------------------- | ---- | ---- | ---- |
------------------- | ---- | -------------- | -------------- |
+| 10000 | 2017-10-01 | 2017-10-01 08:00:05 | 北京 | 20 | 0 | 2017-10-01
06:00:00 | 20 | 10 | 10 |
+| 10000 | 2017-10-01 | 2017-10-01 09:00:05 | 北京 | 20 | 0 | 2017-10-01
07:00:00 | 15 | 2 | 2 |
+| 10001 | 2017-10-01 | 2017-10-01 18:12:10 | 北京 | 30 | 1 | 2017-10-01
17:05:45 | 2 | 22 | 22 |
+| 10002 | 2017-10-02 | 2017-10-02 13:10:00 | 上海 | 20 | 1 | 2017-10-02
12:59:12 | 200 | 5 | 5 |
+| 10003 | 2017-10-02 | 2017-10-02 13:15:00 | 广州 | 32 | 0 | 2017-10-02
11:20:00 | 30 | 11 | 11 |
+| 10004 | 2017-10-01 | 2017-10-01 12:12:48 | 深圳 | 35 | 0 | 2017-10-01
10:00:15 | 100 | 3 | 3 |
+| 10004 | 2017-10-03 | 2017-10-03 12:38:20 | 深圳 | 35 | 0 | 2017-10-03
10:20:22 | 11 | 6 | 6 |
+
+在此基础上,我们创建一个 ROLLUP:
+
+| ColumnName |
+| ---------- |
+| user_id |
+| cost |
+
+该 ROLLUP 只包含两列:user_id 和 cost。则创建完成后,该 ROLLUP 中存储的数据如下:
+
+| user_id | cost |
+| ------- | ---- |
+| 10000 | 35 |
+| 10001 | 2 |
+| 10002 | 200 |
+| 10003 | 30 |
+| 10004 | 111 |
+
+可以看到,ROLLUP 中仅保留了每个 user_id,在 cost 列上的 SUM 的结果。那么当我们进行如下查询时:
+
+```sql
+SELECT user_id, sum(cost) FROM table GROUP BY user_id;
+```
+
+Doris 会自动命中这个 ROLLUP 表,从而只需扫描极少的数据量,即可完成这次聚合查询。
+
+1. 示例2:获得不同城市,不同年龄段用户的总消费、最长和最短页面驻留时间
+
+紧接示例1。我们在 Base 表基础之上,再创建一个 ROLLUP:
+
+| ColumnName | Type | AggregationType | Comment |
+| -------------- | ----------- | --------------- | ---------------- |
+| city | VARCHAR(20) | | 用户所在城市 |
+| age | SMALLINT | | 用户年龄 |
+| cost | BIGINT | SUM | 用户总消费 |
+| max_dwell_time | INT | MAX | 用户最大停留时间 |
+| min_dwell_time | INT | MIN | 用户最小停留时间 |
+
+则创建完成后,该 ROLLUP 中存储的数据如下:
+
+| city | age | cost | max_dwell_time | min_dwell_time |
+| ---- | ---- | ---- | -------------- | -------------- |
+| 北京 | 20 | 35 | 10 | 2 |
+| 北京 | 30 | 2 | 22 | 22 |
+| 上海 | 20 | 200 | 5 | 5 |
+| 广州 | 32 | 30 | 11 | 11 |
+| 深圳 | 35 | 111 | 6 | 3 |
+
+当我们进行如下这些查询时:
+
+```sql
+mysql> SELECT city, age, sum(cost), max(max_dwell_time), min(min_dwell_time)
FROM table GROUP BY city, age;
+mysql> SELECT city, sum(cost), max(max_dwell_time), min(min_dwell_time) FROM
table GROUP BY city;
+mysql> SELECT city, age, sum(cost), min(min_dwell_time) FROM table GROUP BY
city, age;
+```
+
+Doris 会执行这些sql时会自动命中这个 ROLLUP 表。
+
+### Duplicate 模型中的 ROLLUP
+
+因为 Duplicate 模型没有聚合的语意。所以该模型中的
ROLLUP,已经失去了“上卷”这一层含义。而仅仅是作为调整列顺序,以命中前缀索引的作用。我们将在**[数据模型前缀索引](data-model.html#前缀索引)**详细介绍前缀索引,以及如何使用ROLLUP改变前缀索引,以获得更好的查询效率。
+
+## ROLLUP 调整前缀索引
+
+因为建表时已经指定了列顺序,所以一个表只有一种前缀索引。这对于使用其他不能命中前缀索引的列作为条件进行的查询来说,效率上可能无法满足需求。因此,我们可以通过创建
ROLLUP 来人为的调整列顺序。举例说明:
+
+Base 表结构如下:
+
+| ColumnName | Type |
+| -------------- | ------------ |
+| user_id | BIGINT |
+| age | INT |
+| message | VARCHAR(100) |
+| max_dwell_time | DATETIME |
+| min_dwell_time | DATETIME |
+
+我们可以在此基础上创建一个 ROLLUP 表:
+
+| ColumnName | Type |
+| -------------- | ------------ |
+| age | INT |
+| user_id | BIGINT |
+| message | VARCHAR(100) |
+| max_dwell_time | DATETIME |
+| min_dwell_time | DATETIME |
+
+可以看到,ROLLUP 和 Base 表的列完全一样,只是将 user_id 和 age 的顺序调换了。那么当我们进行如下查询时:
+
+```sql
+mysql> SELECT * FROM table where age=20 and message LIKE "%error%";
+```
+
+会优先选择 ROLLUP 表,因为 ROLLUP 的前缀索引匹配度更高。
+
+## ROLLUP使用说明
+
+- ROLLUP 最根本的作用是提高某些查询的查询效率(无论是通过聚合来减少数据量,还是修改列顺序以匹配前缀索引)。因此 ROLLUP 的含义已经超出了
“上卷” 的范围。这也是为什么我们在源代码中,将其命名为 Materialized Index(物化索引)的原因。
+- ROLLUP 是附属于 Base 表的,可以看做是 Base 表的一种辅助数据结构。用户可以在 Base 表的基础上,创建或删除
ROLLUP,但是不能在查询中显式的指定查询某 ROLLUP。是否命中 ROLLUP 完全由 Doris 系统自动决定。
+- ROLLUP 的数据是独立物理存储的。因此,创建的 ROLLUP 越多,占用的磁盘空间也就越大。同时对导入速度也会有影响(导入的ETL阶段会自动产生所有
ROLLUP 的数据),但是不会降低查询效率(只会更好)。
+- ROLLUP 的数据更新与 Base 表是完全同步的。用户无需关心这个问题。
+- ROLLUP 中列的聚合方式,与 Base 表完全相同。在创建 ROLLUP 无需指定,也不能修改。
+- 查询能否命中 ROLLUP 的一个必要条件(非充分条件)是,查询所涉及的**所有列**(包括 select list 和 where
中的查询条件列等)都存在于该 ROLLUP 的列中。否则,查询只能命中 Base 表。
+- 某些类型的查询(如 count(*))在任何条件下,都无法命中 ROLLUP。具体参见接下来的 **聚合模型的局限性** 一节。
+- 可以通过 `EXPLAIN your_sql;` 命令获得查询执行计划,在执行计划中,查看是否命中 ROLLUP。
+- 可以通过 `DESC tbl_name ALL;` 语句显示 Base 表和所有已创建完成的 ROLLUP。
+
+## 查询
+
+在 Doris 里 Rollup 作为一份聚合物化视图,其在查询中可以起到两个作用:
+
+- 索引
+- 聚合数据(仅用于聚合模型,即aggregate key)
+
+但是为了命中 Rollup 需要满足一定的条件,并且可以通过执行计划中 ScanNode 节点的 PreAggregation 的值来判断是否可以命中
Rollup,以及 Rollup 字段来判断命中的是哪一张 Rollup 表。
+
+### 索引
+
+前面的[数据模型](data-model.html)中已经介绍过 Doris 的前缀索引,即 Doris 会把 Base/Rollup 表中的前 36
个字节(有 varchar 类型则可能导致前缀索引不满 36 个字节,varchar 会截断前缀索引,并且最多使用 varchar 的 20
个字节)在底层存储引擎单独生成一份排序的稀疏索引数据(数据也是排序的,用索引定位,然后在数据中做二分查找),然后在查询的时候会根据查询中的条件来匹配每个
Base/Rollup 的前缀索引,并且选择出匹配前缀索引最长的一个 Base/Rollup。
+
+```text
+ -----> 从左到右匹配
++----+----+----+----+----+----+
+| c1 | c2 | c3 | c4 | c5 |... |
+```
+
+如上图,取查询中 where 以及 on 上下推到 ScanNode
的条件,从前缀索引的第一列开始匹配,检查条件中是否有这些列,有则累计匹配的长度,直到匹配不上或者36字节结束(varchar类型的列只能匹配20个字节,并且会匹配不足36个字节截断前缀索引),然后选择出匹配长度最长的一个
Base/Rollup,下面举例说明,创建了一张Base表以及四张rollup:
+
+```text
++---------------+-------+--------------+------+-------+---------+-------+
+| IndexName | Field | Type | Null | Key | Default | Extra |
++---------------+-------+--------------+------+-------+---------+-------+
+| test | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup_index1 | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup_index2 | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k1 | TINYINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup_index3 | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup_index4 | k4 | BIGINT | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
++---------------+-------+--------------+------+-------+---------+-------+
+```
+
+这五张表的前缀索引分别为
+
+```text
+Base(k1 ,k2, k3, k4, k5, k6, k7)
+
+rollup_index1(k9)
+
+rollup_index2(k9)
+
+rollup_index3(k4, k5, k6, k1, k2, k3, k7)
+
+rollup_index4(k4, k6, k5, k1, k2, k3, k7)
+```
+
+能用的上前缀索引的列上的条件需要是 `=` `<` `>` `<=` `>=` `in` `between` 这些并且这些条件是并列的且关系使用 `and`
连接,对于`or`、`!=` 等这些不能命中,然后看以下查询:
+
+```sql
+SELECT * FROM test WHERE k1 = 1 AND k2 > 3;
+```
+
+有 k1 以及 k2 上的条件,检查只有 Base 的第一列含有条件里的 k1,所以匹配最长的前缀索引即 test,explain一下:
+
+```text
+| 0:OlapScanNode
+| TABLE: test
+| PREAGGREGATION: OFF. Reason: No AggregateInfo
+| PREDICATES: `k1` = 1, `k2` > 3
+| partitions=1/1
+| rollup: test
+| buckets=1/10
+| cardinality=-1
+| avgRowSize=0.0
+| numNodes=0
+| tuple ids: 0
+```
+
+再看以下查询:
+
+```sql
+SELECT * FROM test WHERE k4 = 1 AND k5 > 3;
+```
+
+有 k4 以及 k5 的条件,检查 rollup_index3、rollup_index4 的第一列含有 k4,但是 rollup_index3
的第二列含有k5,所以匹配的前缀索引最长。
+
+```text
+| 0:OlapScanNode
+| TABLE: test
+| PREAGGREGATION: OFF. Reason: No AggregateInfo
+| PREDICATES: `k4` = 1, `k5` > 3
+| partitions=1/1
+| rollup: rollup_index3
+| buckets=10/10
+| cardinality=-1
+| avgRowSize=0.0
+| numNodes=0
+| tuple ids: 0
+```
+
+现在我们尝试匹配含有 varchar 列上的条件,如下:
+
+```sql
+SELECT * FROM test WHERE k9 IN ("xxx", "yyyy") AND k1 = 10;
+```
+
+有 k9 以及 k1 两个条件,rollup_index1 以及 rollup_index2 的第一列都含有 k9,按理说这里选择这两个 rollup
都可以命中前缀索引并且效果是一样的随机选择一个即可(因为这里 varchar 刚好20个字节,前缀索引不足36个字节被截断),但是当前策略这里还会继续匹配
k1,因为 rollup_index1 的第二列为 k1,所以选择了 rollup_index1,其实后面的 k1
条件并不会起到加速的作用。(如果对于前缀索引外的条件需要其可以起到加速查询的目的,可以通过建立 Bloom Filter
过滤器加速。一般对于字符串类型建立即可,因为 Doris 针对列存在 Block 级别对于整形、日期已经有 Min/Max 索引) 以下是 explain
的结果。
+
+```text
+| 0:OlapScanNode
+| TABLE: test
+| PREAGGREGATION: OFF. Reason: No AggregateInfo
+| PREDICATES: `k9` IN ('xxx', 'yyyy'), `k1` = 10
+| partitions=1/1
+| rollup: rollup_index1
+| buckets=1/10
+| cardinality=-1
+| avgRowSize=0.0
+| numNodes=0
+| tuple ids: 0
+```
+
+最后看一个多张Rollup都可以命中的查询:
+
+```sql
+SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 AND k6 >= 10000;
+```
+
+有 k4,k5,k6 三个条件,rollup_index3 以及 rollup_index4
的前3列分别含有这三列,所以两者匹配的前缀索引长度一致,选取两者都可以,当前默认的策略为选取了比较早创建的一张 rollup,这里为
rollup_index3。
+
+```text
+| 0:OlapScanNode
+| TABLE: test
+| PREAGGREGATION: OFF. Reason: No AggregateInfo
+| PREDICATES: `k4` < 1000, `k5` = 80, `k6` >= 10000.0
+| partitions=1/1
+| rollup: rollup_index3
+| buckets=10/10
+| cardinality=-1
+| avgRowSize=0.0
+| numNodes=0
+| tuple ids: 0
+```
+
+如果稍微修改上面的查询为:
+
+```
+SELECT * FROM test WHERE k4 < 1000 AND k5 = 80 OR k6 >= 10000;
+```
+
+则这里的查询不能命中前缀索引。(甚至 Doris 存储引擎内的任何 Min/Max,BloomFilter 索引都不能起作用)
+
+### 聚合数据
+
+当然一般的聚合物化视图其聚合数据的功能是必不可少的,这类物化视图对于聚合类查询或报表类查询都有非常大的帮助,要命中聚合物化视图需要下面一些前提:
+
+1. 查询或者子查询中涉及的所有列都存在一张独立的 Rollup 中。
+2. 如果查询或者子查询中有 Join,则 Join 的类型需要是 Inner join。
+
+以下是可以命中Rollup的一些聚合查询的种类,
+
+| 列类型 查询类型 | Sum | Distinct/Count Distinct | Min | Max |
APPROX_COUNT_DISTINCT |
+| --------------- | ----- | ----------------------- | ----- | ----- |
--------------------- |
+| Key | false | true | true | true | true
|
+| Value(Sum) | true | false | false | false | false
|
+| Value(Replace) | false | false | false | false | false
|
+| Value(Min) | false | false | true | false | false
|
+| Value(Max) | false | false | false | true | false
|
+
+如果符合上述条件,则针对聚合模型在判断命中 Rollup 的时候会有两个阶段:
+
+1. 首先通过条件匹配出命中前缀索引索引最长的 Rollup 表,见上述索引策略。
+2. 然后比较 Rollup 的行数,选择最小的一张 Rollup。
+
+如下 Base 表以及 Rollup:
+
+```text
++-------------+-------+--------------+------+-------+---------+-------+
+| IndexName | Field | Type | Null | Key | Default | Extra |
++-------------+-------+--------------+------+-------+---------+-------+
+| test_rollup | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k6 | CHAR(5) | Yes | true | N/A | |
+| | k7 | DATE | Yes | true | N/A | |
+| | k8 | DATETIME | Yes | true | N/A | |
+| | k9 | VARCHAR(20) | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup2 | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
+| | | | | | | |
+| rollup1 | k1 | TINYINT | Yes | true | N/A | |
+| | k2 | SMALLINT | Yes | true | N/A | |
+| | k3 | INT | Yes | true | N/A | |
+| | k4 | BIGINT | Yes | true | N/A | |
+| | k5 | DECIMAL(9,3) | Yes | true | N/A | |
+| | k10 | DOUBLE | Yes | false | N/A | MAX |
+| | k11 | FLOAT | Yes | false | N/A | SUM |
++-------------+-------+--------------+------+-------+---------+-------+
+```
+
+看以下查询:
+
+```sql
+SELECT SUM(k11) FROM test_rollup WHERE k1 = 10 AND k2 > 200 AND k3 in (1,2,3);
+```
+
+首先判断查询是否可以命中聚合的 Rollup表,经过查上面的图是可以的,然后条件中含有 k1,k2,k3 三个条件,这三个条件
test_rollup、rollup1、rollup2 的前三列都含有,所以前缀索引长度一致,然后比较行数显然 rollup2 的聚合程度最高行数最少所以选取
rollup2。
+
+```text
+| 0:OlapScanNode |
+| TABLE: test_rollup |
+| PREAGGREGATION: ON |
+| PREDICATES: `k1` = 10, `k2` > 200, `k3` IN (1, 2, 3) |
+| partitions=1/1 |
+| rollup: rollup2 |
+| buckets=1/10 |
+| cardinality=-1 |
+| avgRowSize=0.0 |
+| numNodes=0 |
+| tuple ids: 0 |
+```
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]