This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new db10894c567a [doc]update en docs of data partition (#475)
db10894c567a is described below
commit db10894c567a05a52daabee02b0965cdf984db12
Author: Luzhijing <[email protected]>
AuthorDate: Wed Mar 27 11:28:47 2024 +0800
[doc]update en docs of data partition (#475)
---
.../version-2.0/table-design/data-partition.md | 6 +-
.../version-2.0/table-design/data-partition.md | 1125 +++++++++++++++-----
2 files changed, 878 insertions(+), 253 deletions(-)
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
index d58028d40617..774b1beb5e7f 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
@@ -402,7 +402,7 @@ PARTITION BY LIST(id, city)
- `dynamic_partition.start_day_of_month`
- 当 `time_unit` 为 `MONTH` 时,该参数用于指定每月的起始日期。取值为 1 到 28。其中 1 表示每月 1 号,28 表示每月 28
号。默认为 1,即表示每月以 1 号位起始点。暂不支持以 29、30、31 号为起始日,以避免因闰年或闰月带来的歧义。
+ 当 `time_unit` 为 `MONTH` 时,该参数用于指定每月的起始日期。取值为 1 到 28。其中 1 表示每月 1 号,28 表示每月 28
号。默认为 1,即表示每月以 1 号为起始点。暂不支持以 29、30、31 号为起始日,以避免因闰年或闰月带来的歧义。
- `dynamic_partition.create_history_partition`
@@ -474,7 +474,7 @@ PARTITION BY LIST(id, city)
假设需要创建的历史分区数量为 `expect_create_partition_num`,根据不同的设置具体数量如下:
- create_history_partition = true
-
+
dynamic_partition.history_partition_num 未设置,即 -1.
expect_create_partition_num = end - start;
dynamic_partition.history_partition_num 已设置 expect_create_partition_num =
end - max(start, -histoty_partition_num);
@@ -578,7 +578,7 @@ PROPERTIES
);
```
-假设当前日期为 2020-05-29,是 2020 年的第 22 周。默认每周起始为星期一。则根于以上规则,tbl1 会产生以下分区:
+假设当前日期为 2020-05-29,是 2020 年的第 22 周。默认每周起始为星期一。则以上规则,tbl1 会产生以下分区:
```Plain
p2020_22: ["2020-05-25 00:00:00", "2020-06-01 00:00:00")
diff --git a/versioned_docs/version-2.0/table-design/data-partition.md
b/versioned_docs/version-2.0/table-design/data-partition.md
index 661bf7cbe9b6..46d400e2f121 100644
--- a/versioned_docs/version-2.0/table-design/data-partition.md
+++ b/versioned_docs/version-2.0/table-design/data-partition.md
@@ -24,44 +24,47 @@ specific language governing permissions and limitations
under the License.
-->
-# Data Partition
+This document mainly introduces table creation and data partitioning in Doris,
as well as potential problems and solutions encountered during table creation
operations.
-This topic is about table creation and data partitioning in Doris, including
the common problems in table creation and their solutions.
+## Basic concepts
-## Basic Concepts
-
-In Doris, data is logically described in the form of table.
+In Doris, data is logically described in the form of tables.
### Row & Column
-A table contains rows and columns.
+A table consists of rows and columns:
-Row refers to a row of user data. Column is used to describe different fields
in a row of data.
+- Row: Represents a single line of user data;
+- Column: Used to describe different fields in a row of data;
+- Columns can be divided into two types: Key and Value. From a business
perspective, Key and Value can correspond to dimension columns and metric
columns, respectively. The key columns in Doris are those specified in the
table creation statement, which are the columns following the keywords `unique
key`, `aggregate key`, or `duplicate key`. The remaining columns are value
columns. From the perspective of the aggregation model, rows with the same Key
columns will be aggregated into a sing [...]
-Columns can be divided into two categories: Key and Value. From a business
perspective, Key and Value correspond to dimension columns and metric columns,
respectively. The key column of Doris is the column specified in the table
creation statement. The column after the keyword 'unique key' or 'aggregate
key' or 'duplicate key' in the table creation statement is the key column, and
the rest except the key column is the value column. In the Aggregate Model,
rows with the same values in Key [...]
+### Partition & Tablet
-### Tablet & Partition
+Doris supports two levels of data partitioning. The first level is
Partitioning, which supports Range and List partition. The second level is
Bucket (also known as Tablet), which supports Hash and Random . If no
partitioning is established during table creation, Doris generates a default
partition that is transparent to the user. When using the default partition,
only Bucket is supported.
-In the Doris storage engine, user data are horizontally divided into data
tablets (also known as data buckets). Each tablet contains several rows of
data. The data between the individual tablets do not intersect and is
physically stored independently.
+In the Doris storage engine, data is horizontally partitioned into several
tablets. Each tablet contains several rows of data. There is no overlap between
the data in different tablets, and they are stored physically independently.
-Tablets are logically attributed to different Partitions. One Tablet belongs
to only one Partition, and one Partition contains several Tablets. Since the
tablets are physically stored independently, the partitions can be seen as
physically independent, too. Tablet is the smallest physical storage unit for
data operations such as movement and replication.
+Multiple tablets logically belong to different partitions. A single tablet
belongs to only one partition, while a partition contains several tablets.
Because tablets are stored physically independently, partitions can also be
considered physically independent. The tablet is the smallest physical storage
unit for operations such as data movement and replication.
-A Table is formed of multiple Partitions. Partition can be thought of as the
smallest logical unit of management. Data import and deletion can be performed
on only one Partition.
+Several partitions compose a table. The partition can be considered the
smallest logical management unit.
-## Data Partitioning
+Benefits of Two-Level data partitioning:
-The following illustrates on data partitioning in Doris using the example of a
CREATE TABLE operation.
+- For dimensions with time or similar ordered values, such dimension columns
can be used as partitioning columns. The partition granularity can be evaluated
based on import frequency and partition data volume.
+- Historical data deletion requirements: If there is a need to delete
historical data (such as retaining only the data for the most recent several
days), composite partition can be used to achieve this goal by deleting
historical partitions. Alternatively, DELETE statements can be sent within
specified partitions to delete data.
+- Solving data skew issues: Each partition can specify the number of buckets
independently. For example, when partitioning by day and there are significant
differences in data volume between days, the number of buckets for each
partition can be specified to reasonably distribute data across different
partitions. It is recommended to choose a column with high distinctiveness as
the bucketing column.
-CREATE TABLE in Doris is a synchronous command. It returns results after the
SQL execution is completed. Successful returns indicate successful table
creation. For more information on the syntax, please refer to [CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md),
or input the `HELP CREATE TABLE;` command.
+### Example of creating a table
-This section introduces how to create tables in Doris.
+CREATE TABLE in Doris is a synchronous command. It returns results after the
SQL execution is completed. Successful returns indicate successful table
creation. For more information, please refer to [CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE),
or input the `HELP CREATE TABLE;` command.
-```sql
--- Range Partition
+This section introduces how to create tables in Doris by range partiton and
hash buckets.
-CREATE TABLE IF NOT EXISTS example_db.example_range_tbl
+```
+-- Range Partition
+CREATE TABLE IF NOT EXISTS example_range_tbl
(
- `user_id` LARGEINT NOT NULL COMMENT "User ID",
+ `user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Date when the data are imported",
`timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are
imported",
`city` VARCHAR(20) COMMENT "User location city",
@@ -70,355 +73,977 @@ CREATE TABLE IF NOT EXISTS example_db.example_range_tbl
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT
"User last visit time",
`cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
- `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
+ `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
)
-ENGINE=olap
+ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
- PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
- PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
- PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
- PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
+ PARTITION `p201701` VALUES [("2017-01-01"), ("2017-02-01")),
+ PARTITION `p201702` VALUES [("2017-02-01"), ("2017-03-01")),
+ PARTITION `p201703` VALUES [("2017-03-01"), ("2017-04-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
- "replication_num" = "3",
- "storage_medium" = "SSD",
- "storage_cooldown_time" = "2018-01-01 12:00:00"
+ "replication_num" = "1"
);
+```
+
+Here use the AGGREGATE KEY data model as an example. In the AGGREGATE KEY data
model, all columns that are specified with an aggregation type (SUM, REPLACE,
MAX, or MIN) are Value columns. The rest are the Key columns.
+
+In the PROPERTIES at the end of the CREATE TABLE statement, you can find
detailed information about the relevant parameters that can be set in
PROPERTIES by referring to the documentation on [CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE).
+
+The default type of ENGINE is OLAP. In Doris, only this OLAP ENGINE type is
responsible for data management and storage by Doris itself. Other ENGINE
types, such as mysql, broker, es, etc., are essentially just mappings to tables
in other external databases or systems, allowing Doris to read this data.
However, Doris itself does not create, manage, or store any tables or data for
non-OLAP ENGINE types.
+
+`IF NOT EXISTS` indicates that if the table has not been created before, it
will be created. Note that this only checks if the table name exists and does
not check if the schema of the new table is the same as the schema of an
existing table. Therefore, if there is a table with the same name but a
different schema, this command will also return successfully, but it does not
mean that a new table and a new schema have been created.
+
+### View partition
+
+You can use the `show create table` command to view the partition information
of a table.
+
+```
+> show create table example_range_tbl
++-------------------+---------------------------------------------------------------------------------------------------------+
+| Table | Create Table
|
++-------------------+---------------------------------------------------------------------------------------------------------+
+| example_range_tbl | CREATE TABLE `example_range_tbl` (
|
+| | `user_id` largeint(40) NOT NULL COMMENT 'User ID',
|
+| | `date` date NOT NULL COMMENT 'Date when the data are
imported', |
+| | `timestamp` datetime NOT NULL COMMENT 'Timestamp when
the data are imported', |
+| | `city` varchar(20) NULL COMMENT 'User location city',
|
+| | `age` smallint(6) NULL COMMENT 'User age',
|
+| | `sex` tinyint(4) NULL COMMENT 'User gender',
|
+| | `last_visit_date` datetime REPLACE NULL DEFAULT
"1970-01-01 00:00:00" COMMENT 'User last visit time', |
+| | `cost` bigint(20) SUM NULL DEFAULT "0" COMMENT 'Total
user consumption', |
+| | `max_dwell_time` int(11) MAX NULL DEFAULT "0" COMMENT
'Maximum user dwell time', |
+| | `min_dwell_time` int(11) MIN NULL DEFAULT "99999"
COMMENT 'Minimum user dwell time' |
+| | ) ENGINE=OLAP
|
+| | AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`,
`age`, `sex`) |
+| | COMMENT 'OLAP'
|
+| | PARTITION BY RANGE(`date`)
|
+| | (PARTITION p201701 VALUES [('0000-01-01'),
('2017-02-01')), |
+| | PARTITION p201702 VALUES [('2017-02-01'),
('2017-03-01')), |
+| | PARTITION p201703 VALUES [('2017-03-01'),
('2017-04-01'))) |
+| | DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
|
+| | PROPERTIES (
|
+| | "replication_allocation" = "tag.location.default: 1",
|
+| | "is_being_synced" = "false",
|
+| | "storage_format" = "V2",
|
+| | "light_schema_change" = "true",
|
+| | "disable_auto_compaction" = "false",
|
+| | "enable_single_replica_compaction" = "false"
|
+| | );
|
++-------------------+---------------------------------------------------------------------------------------------------------+
+```
+
+You can use `show partitions from your_table` command to view the partition
information of a table.
+
+```
+> show partitions from example_range_tbl
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
+| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State
| PartitionKey | Range
| DistributionKey | Buckets | ReplicationNum | StorageMedium
+| CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime |
DataSize | IsInMemory | ReplicaAllocation | IsMutable |
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
+| 28731 | p201701 | 1 | 2024-01-25 10:50:51 | NORMAL
| date | [types: [DATEV2]; keys: [0000-01-01]; ..types: [DATEV2]; keys:
[2017-02-01]; ) | user_id | 16 | 1 | HDD
+| 9999-12-31 23:59:59 | | | 0.000 |
false | tag.location.default: 1 | true |
+| 28732 | p201702 | 1 | 2024-01-25 10:50:51 | NORMAL
| date | [types: [DATEV2]; keys: [2017-02-01]; ..types: [DATEV2]; keys:
[2017-03-01]; ) | user_id | 16 | 1 | HDD
+| 9999-12-31 23:59:59 | | | 0.000 |
false | tag.location.default: 1 | true |
+| 28733 | p201703 | 1 | 2024-01-25 10:50:51 | NORMAL
| date | [types: [DATEV2]; keys: [2017-03-01]; ..types: [DATEV2]; keys:
[2017-04-01]; ) | user_id | 16 | 1 | HDD
+| 9999-12-31 23:59:59 | | | 0.000 |
false | tag.location.default: 1 | true |
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
+```
+
+### Alter partition
+
+You can add a new partition by using the `alter table add partition` command.
+
+```
+ALTER TABLE example_range_tbl ADD PARTITION p201704 VALUES LESS
THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5;
+```
+
+For more partition modification operations, please refer to the SQL manual on
[ALTER-TABLE-PARTITION](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION).
+
+## Manual partitioning
+
+### Partition columns
+- Partition columns can be specified as one or multiple columns, and the
partition columns must be KEY columns. The usage of multi-column partitioning
will be introduced later in the summary section of multi-column partitioning.
+- When `allowPartitionColumnNullable` is set to true, Range partition supports
the use of NULL partition columns. List Partition does not support NULL
partition columns at all times.
+- Regardless of the type of partition column, double quotes are required when
writing partition values.
+- There is theoretically no upper limit on the number of partitions.
+- When creating a table without partitioning, the system will automatically
generate a full-range partition with the same name as the table name. This
partition is not visible to users and cannot be deleted or modified.
+- Overlapping ranges are not allowed when creating partitions.
--- List Partition
+### Range partition
-CREATE TABLE IF NOT EXISTS example_db.example_list_tbl
+Partition columns are usually time columns for convenient management of old
and new data. Range partition supports column types such as DATE, DATETIME,
TINYINT, SMALLINT, INT, BIGINT, and LARGEINT.
+
+Partition information supports four writing methods:
+
+- FIXED RANGE: the partition as a left-closed, right-open interval.
+
+```
+PARTITION BY RANGE(col1[, col2, ...])
+(
+ PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1",
"k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)),
+ PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...),
("k1-upper1-2", MAXVALUE, ))
+)
+```
+
+For example:
+
+```
+PARTITION BY RANGE(`date`)
(
- `user_id` LARGEINT NOT NULL COMMENT "User ID",
- `date` DATE NOT NULL COMMENT "Date when the data are imported",
- `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are
imported",
- `city` VARCHAR(20) COMMENT "User location city",
- `age` SMALLINT COMMENT "User Age",
- `sex` TINYINT COMMENT "User gender",
- `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT
"User last visit time",
- `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption",
- `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
- `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
+ PARTITION `p201701` VALUES [("2017-01-01"), ("2017-02-01")),
+ PARTITION `p201702` VALUES [("2017-02-01"), ("2017-03-01")),
+ PARTITION `p201703` VALUES [("2017-03-01"), ("2017-04-01"))
)
-ENGINE=olap
-AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
-PARTITION BY LIST(`city`)
+```
+
+- LESS THAN: Only define the upper bound of the partition. The lower bound is
determined by the upper bound of the previous partition.
+
+```
+PARTITION BY RANGE(col1[, col2, ...])
+(
+ PARTITION partition_name1 VALUES LESS THAN MAXVALUE | ("value1", "value2",
...),
+ PARTITION partition_name2 VALUES LESS THAN MAXVALUE | ("value1", "value2",
...)
+)
+```
+
+For example:
+
+```
+PARTITION BY RANGE(`date`)
+(
+ PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
+ PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
+ PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
+)
+
+PARTITION BY RANGE(`date`)
+(
+ PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
+ PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
+ PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
+ PARTITION `other` VALUES LESS THAN (MAXVALUE)
+)
+```
+
+- BATCH RANGE: Batch create RANGE partitions of numeric and time types,
defining the partitions as left-closed, right-open intervals, and setting the
step size.
+
+```
+PARTITION BY RANGE(int_col)
+(
+ FROM (start_num) TO (end_num) INTERVAL interval_value
+)
+
+PARTITION BY RANGE(date_col)
+(
+ FROM ("start_date") TO ("end_date") INTERVAL num YEAR | num MONTH | num
WEEK | num DAY | 1 HOUR
+)
+```
+
+For example:
+
+```
+PARTITION BY RANGE(age)
+(
+ FROM (1) TO (100) INTERVAL 10
+)
+
+PARTITION BY RANGE(`date`)
+(
+ FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 2 YEAR
+)
+```
+
+- MULTI RANGE: Batch create RANGE partitions, defining the partitions as
left-closed, right-open intervals. For example:
+
+```
+PARTITION BY RANGE(col)
+(
+ FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,
+ FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,
+ FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,
+ FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
+ PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))
+)
+```
+
+### List partition
+
+Partition columns support data types such as BOOLEAN, TINYINT, SMALLINT, INT,
BIGINT, LARGEINT, DATE, DATETIME, CHAR, and VARCHAR. Partition values are
enumerated values. Only when the data is one of the enumerated values of the
target partition, the partition can be hit .
+
+Partitions support specifying the enumerated values contained in each
partition through VALUES IN (...).
+
+For example:
+
+```
+PARTITION BY LIST(city)
(
PARTITION `p_cn` VALUES IN ("Beijing", "Shanghai", "Hong Kong"),
PARTITION `p_usa` VALUES IN ("New York", "San Francisco"),
PARTITION `p_jp` VALUES IN ("Tokyo")
)
-DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
+```
+
+List partition also supports multi-column partitioning, for example:
+
+```
+PARTITION BY LIST(id, city)
+(
+ PARTITION p1_city VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
+ PARTITION p2_city VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
+ PARTITION p3_city VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
+)
+```
+
+## Dynamic partition
+
+Dynamic partition is designed to manage partition's Time-to-Life (TTL),
reducing the burden on users.
+
+In some usage scenarios, the user will partition the table according to the
day and perform routine tasks regularly every day. At this time, the user needs
to manually manage the partition. Otherwise, the data load may fail because the
user does not create a partition. This brings additional maintenance costs to
the user.
+
+With dynamic partitioning, users can define rules for partition creation and
deletion when establishing tables. The FE initiates a background thread to
handle partition creation or deletion based on these user-defined rules. Users
also have the flexibility to modify these rules during runtime.
+
+It's important to note that dynamic partitioning is exclusively supported by
range partitions. Currently, the functionality enables dynamic addition and
deletion of partitions.
+
+:::tip
+
+This feature will be disabled when synchronized by CCR. If this table is
copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will
be displayed as enabled in show create table, but will not actually take
effect. When `is_being_synced` is set to `false`, these features will resume
working, but the `is_being_synced` property is for CCR peripheral modules only
and should not be manually set during CCR synchronization.
+
+:::
+
+### How to use
+
+The rules for dynamic partitioning can be specified when the table is created
or modified at runtime.
+
+Currently, dynamic partition rules can only be set for partition tables with
single partition columns.
+
+- Specified when creating table
+
+```
+CREATE TABLE tbl1
+(...)
PROPERTIES
(
- "replication_num" = "3",
- "storage_medium" = "SSD",
- "storage_cooldown_time" = "2018-01-01 12:00:00"
-);
+ "dynamic_partition.prop1" = "value1",
+ "dynamic_partition.prop2" = "value2",
+ ...
+)
+```
+
+- Modify at runtime
```
+ALTER TABLE tbl1 SET
+(
+ "dynamic_partition.prop1" = "value1",
+ "dynamic_partition.prop2" = "value2",
+ ...
+)
+```
+
+### Rule parameters
+
+The rules of dynamic partition are prefixed with `dynamic_partition.`:
+
+- `dynamic_partition.enable`
+
+ Whether to enable the dynamic partition feature. Can be specified as `TRUE`
or` FALSE`. If not filled, the default is `TRUE`. If it is `FALSE`, Doris will
ignore the dynamic partitioning rules of the table.
+
+- `dynamic_partition.time_unit`(required parameters)
+
+ The unit for dynamic partition scheduling. Can be specified as
`HOUR`,`DAY`,` WEEK`, `MONTH` and `YEAR`, means to create or delete partitions
by hour, day, week, month and year, respectively.
+
+ When specified as `HOUR`, the suffix format of the dynamically created
partition name is `yyyyMMddHH`, for example, `2020032501`. *When the time unit
is HOUR, the data type of partition column cannot be DATE.*
+
+ When specified as `DAY`, the suffix format of the dynamically created
partition name is `yyyyMMdd`, for example, `20200325`.
+
+ When specified as `WEEK`, the suffix format of the dynamically created
partition name is `yyyy_ww`. That is, the week of the year of current date. For
example, the suffix of the partition created for `2020-03-25` is `2020_13`,
indicating that it is currently the 13th week of 2020.
+
+ When specified as `MONTH`, the suffix format of the dynamically created
partition name is `yyyyMM`, for example, `202003`.
+
+ When specified as `YEAR`, the suffix format of the dynamically created
partition name is `yyyy`, for example, `2020`.
-### Definition of Column
+- `dynamic_partition.time_zone`
-Here we only use the AGGREGATE KEY data model as an example. See [Doris Data
Model](./data-model.md) for more information.
+ The time zone of the dynamic partition, if not filled in, defaults to the
time zone of the current machine's system, such as `Asia/Shanghai`, if you want
to know the supported TimeZone, you can found in
[Timezone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
-You can view the basic types of columns by executing `HELP CREATE TABLE;` in
MySQL Client.
+- `dynamic_partition.start`
-In the AGGREGATE KEY data model, all columns that are specified with an
aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are
the Key columns.
+ The starting offset of the dynamic partition, usually a negative number.
Depending on the `time_unit` attribute, based on the current day (week /
month), the partitions with a partition range before this offset will be
deleted. If not filled, the default is `-2147483648`, that is, the history
partition will not be deleted.
-A few suggested rules for defining columns include:
+- `dynamic_partition.end`(required parameters)
-1. The Key columns must precede all Value columns.
-2. Try to choose the INT type as much as possible. Because calculations and
lookups on INT types are much more efficient than those on strings.
-3. For the lengths of the INT types, follow the **good enough** principle.
-4. For the lengths of the VARCHAR and STRING types, also follow the **good
enough** principle.
+ The end offset of the dynamic partition, usually a positive number.
According to the difference of the `time_unit` attribute, the partition of the
corresponding range is created in advance based on the current day (week /
month).
-### Partitioning and Bucketing
-Doris supports two layers of data partitioning. The first level is Partition,
including range partitioning and list partitioning. The second is Bucket
(Tablet), including hash and random partitioning.
+- `dynamic_partition.prefix`(required parameters)
-It is also possible to use one layer of data partitioning, If you do not write
the partition statement when creating the table, Doris will generate a default
partition at this time, which is transparent to the user. In this case, it only
supports data bucketing.
+ The dynamically created partition name prefix.
-1. Partition
- * You can specify one or more columns as the partitioning columns, but they
have to be KEY columns. The usage of multi-column partitions is described
further below.
- * Regardless of the type of the partitioning columns, double quotes are
required for partition values.
- * There is no theoretical limit on the number of partitions.
- * If users create a table without specifying the partitions, the system
will automatically generate a Partition with the same name as the table. This
Partition contains all data in the table and is neither visible to users nor
modifiable.
- * **Partitions should not have overlapping ranges**.
+- `dynamic_partition.buckets`
- #### Range Partitioning
+ The number of buckets corresponding to the dynamically created partitions.
- * Partitioning columns are usually time columns for easy management of old
and new data.
+- `dynamic_partition.replication_num`
- * Range partitioning support column type:
[DATE,DATETIME,TINYINT,SMALLINT,INT,BIGINT,LARGEINT]
+ The replication number of dynamic partition.If not filled in, defaults to
the number of table's replication number.
- * Range partitioning supports specifying only the upper bound by `VALUES
LESS THAN (...)`. The system will use the upper bound of the previous partition
as the lower bound of the next partition, and generate a left-closed right-open
interval. It also supports specifying both the upper and lower bounds by
`VALUES [...)`, and generate a left-closed right-open interval.
+- `dynamic_partition.start_day_of_week`
- * The following takes the `VALUES [...)` method as an example since it is
more comprehensible. It shows how the partition ranges change as we use the
`VALUES LESS THAN (...)` statement to add or delete partitions:
+ When `time_unit` is` WEEK`, this parameter is used to specify the starting
point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is
Sunday. The default is 1, which means that every week starts on Monday.
- * As in the `example_range_tbl` example above, when the table is created,
the following 3 partitions are automatically generated:
- ```
- P201701: [MIN_VALUE, 2017-02-01)
- P201702: [2017-02-01, 2017-03-01)
- P201703: [2017-03-01, 2017-04-01)
- ```
+- `dynamic_partition.start_day_of_month`
- * If we add Partition p201705 VALUES LESS THAN ("2017-06-01"), the
results will be as follows:
+ When `time_unit` is` MONTH`, this parameter is used to specify the start
date of each month. The value ranges from 1 to 28. 1 means the 1st of every
month, and 28 means the 28th of every month. The default is 1, which means that
every month starts at 1st. The 29, 30 and 31 are not supported at the moment to
avoid ambiguity caused by lunar years or months.
- ```
- P201701: [MIN_VALUE, 2017-02-01)
- P201702: [2017-02-01, 2017-03-01)
- P201703: [2017-03-01, 2017-04-01)
- P201705: [2017-04-01, 2017-06-01)
- ```
- * Then we delete Partition p201703, the results will be as follows:
+- `dynamic_partition.create_history_partition`
- ```
- p201701: [MIN_VALUE, 2017-02-01)
- p201702: [2017-02-01, 2017-03-01)
- p201705: [2017-04-01, 2017-06-01)
- ```
+ The default is false. When set to true, Doris will automatically create all
partitions, as described in the creation rules below. At the same time, the
parameter `max_dynamic_partition_num` of FE will limit the total number of
partitions to avoid creating too many partitions at once. When the number of
partitions expected to be created is greater than `max_dynamic_partition_num`,
the operation will fail.
- > Note that the partition range of p201702 and p201705 has not changed,
and there is a gap between the two partitions: [2017-03-01, 2017-04-01). That
means, if the imported data is within this gap range, the import would fail.
+ When the `start` attribute is not specified, this parameter has no effect.
- * Now we go on and delete Partition p201702, the results will be as
follows:
+- `dynamic_partition.history_partition_num`
- ```
- p201701: [MIN_VALUE, 2017-02-01)
- p201705: [2017-04-01, 2017-06-01)
- ```
+ When `create_history_partition` is `true`, this parameter is used to specify
the number of history partitions. The default value is -1, which means it is
not set.
- > The gap range expands to: [2017-02-01, 2017-04-01)
+- `dynamic_partition.hot_partition_num`
- * Then we add Partition p201702new VALUES LESS THAN ("2017-03-01"), the
results will be as follows:
+ Specify how many of the latest partitions are hot partitions. For hot
partition, the system will automatically set its `storage_medium` parameter to
SSD, and set `storage_cooldown_time`.
- ```
- p201701: [MIN_VALUE, 2017-02-01)
- p201702new: [2017-02-01, 2017-03-01)
- p201705: [2017-04-01, 2017-06-01)
- ```
+ :::tip
- > The gap range shrinks to: [2017-03-01, 2017-04-01)
+ If there is no SSD disk path under the storage path, configuring this
parameter will cause dynamic partition creation to fail.
- * Now we delete Partition p201701 and add Partition p201612 VALUES LESS
THAN ("2017-01-01"), the partition result is as follows:
+ :::
- ```
- p201612: [MIN_VALUE, 2017-01-01)
- p201702new: [2017-02-01, 2017-03-01)
- p201705: [2017-04-01, 2017-06-01)
- ```
+ `hot_partition_num` is all partitions in the previous n days and in the
future.
- > This results in a new gap range: [2017-01-01, 2017-02-01)
+ Let us give an example. Suppose today is 2021-05-20, partition by day, and
the properties of dynamic partition are set to: hot_partition_num=2, end=3,
start=-3. Then the system will automatically create the following partitions,
and set the `storage_medium` and `storage_cooldown_time` properties:
+
+ ```
+ p20210517: ["2021-05-17", "2021-05-18") storage_medium=HDD
storage_cooldown_time=9999-12-31 23:59:59
+ p20210518: ["2021-05-18", "2021-05-19") storage_medium=HDD
storage_cooldown_time=9999-12-31 23:59:59
+ p20210519: ["2021-05-19", "2021-05-20") storage_medium=SSD
storage_cooldown_time=2021-05-21 00:00:00
+ p20210520: ["2021-05-20", "2021-05-21") storage_medium=SSD
storage_cooldown_time=2021-05-22 00:00:00
+ p20210521: ["2021-05-21", "2021-05-22") storage_medium=SSD
storage_cooldown_time=2021-05-23 00:00:00
+ p20210522: ["2021-05-22", "2021-05-23") storage_medium=SSD
storage_cooldown_time=2021-05-24 00:00:00
+ p20210523: ["2021-05-23", "2021-05-24") storage_medium=SSD
storage_cooldown_time=2021-05-25 00:00:00
+ ```
+
+
+- `dynamic_partition.reserved_history_periods`
+
+ The range of reserved history periods. It should be in the form of
`[yyyy-MM-dd,yyyy-MM-dd],[...,...]` while the `dynamic_partition.time_unit` is
"DAY, WEEK, MONTH and YEAR". And it should be in the form of `[yyyy-MM-dd
HH:mm:ss,yyyy-MM-dd HH:mm:ss],[...,...]` while the dynamic_partition.time_unit`
is "HOUR". And no more spaces expected. The default value is `"NULL"`, which
means it is not set.
+
+ Let us give an example. Suppose today is 2021-09-06,partitioned by day, and
the properties of dynamic partition are set to:
+
+ ```time_unit="DAY/WEEK/MONTH/YEAR", end=3, start=-3,
reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]"```.
+
+ The system will automatically reserve following partitions in following
period :
+
+ ```
+ ["2020-06-01","2020-06-20"],
+ ["2020-10-31","2020-11-15"]
+ ```
+
+ or
+
+ ```time_unit="HOUR", end=3, start=-3, reserved_history_periods="[2020-06-01
00:00:00,2020-06-01 03:00:00]"```.
+
+ The system will automatically reserve following partitions in following
period :
+
+ ```
+ ["2020-06-01 00:00:00","2020-06-01 03:00:00"]
+ ```
- In summary, the deletion of a partition does not change the range of the
existing partitions, but might result in gaps. When a partition is added via
the `VALUES LESS THAN` statement, the lower bound of one partition is the upper
bound of its previous partition.
+ Otherwise, every `[...,...]` in `reserved_history_periods` is a couple of
properties, and they should be set at the same time. And the first date can't
be larger than the second one.
- In addition to the single-column partitioning mentioned above, Range
Partitioning also supports **multi-column partitioning**. Examples are as
follows:
- ```text
- PARTITION BY RANGE(`date`, `id`)
- (
- PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
- PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
- PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
- )
+- `dynamic_partition.storage_medium`
+
+ <version since="1.2.3"></version>
+
+ Specifies the default storage medium for the created dynamic partition. HDD
is the default, SSD can be selected.
+
+ Note that when set to SSD, the `hot_partition_num` property will no longer
take effect, all partitions will default to SSD storage media and the cooldown
time will be 9999-12-31 23:59:59.
+
+#### Create history partition rules
+
+When `create_history_partition` is `true`, i.e. history partition creation is
enabled, Doris determines the number of history partitions to be created based
on `dynamic_partition.start` and `dynamic_partition.history_partition_num`.
+
+Assuming the number of history partitions to be created is
`expect_create_partition_num`, the number is as follows according to different
settings.
+
+- `create_history_partition` = `true`
+ - `dynamic_partition.history_partition_num` is not set, i.e. -1.
+ `expect_create_partition_num` = `end` - `start`;
+ - `dynamic_partition.history_partition_num` is set
+ `expect_create_partition_num` = `end` - max(`start`,
`-history_partition_num`);
+- `create_history_partition` = `false`
+
+No history partition will be created, `expect_create_partition_num` = `end` -
0;
+
+When `expect_create_partition_num` is greater than `max_dynamic_partition_num`
(default 500), creating too many partitions is prohibited.
+
+**Examples:**
+
+Suppose today is 2021-05-20, partition by day, and the attributes of dynamic
partition are set to `create_history_partition=true, end=3, start=-3,
history_partition_num=1`, then the system will automatically create the
following partitions.
+
+```
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+`history_partition_num=5` and keep the rest attributes as in 1, then the
system will automatically create the following partitions.
+
+```
+p20210517
+p20210518
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+`history_partition_num=-1` i.e., if you do not set the number of history
partitions and keep the rest of the attributes as in 1, the system will
automatically create the following partitions.
+
+```
+p20210517
+p20210518
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+### Example
+
+1. Table `tbl1` partition column k1, type is DATE, create a dynamic partition
rule. By day partition, only the partitions of the last 7 days are kept, and
the partitions of the next 3 days are created in advance.
+
+ ```
+ CREATE TABLE tbl1
+ (
+ k1 DATE,
+ ...
+ )
+ PARTITION BY RANGE(k1) ()
+ DISTRIBUTED BY HASH(k1)
+ PROPERTIES
+ (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "DAY",
+ "dynamic_partition.start" = "-7",
+ "dynamic_partition.end" = "3",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "32"
+ );
```
- In the above example, we specify `date` (DATE type) and `id` (INT type) as
the partitioning columns, so the resulting partitions will be as follows:
+ Suppose the current date is 2020-05-29. According to the above rules, tbl1
will produce the following partitions:
- ``` text
- *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
- *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
- *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
- ```
+ ```
+ p20200529: ["2020-05-29", "2020-05-30")
+ p20200530: ["2020-05-30", "2020-05-31")
+ p20200531: ["2020-05-31", "2020-06-01")
+ p20200601: ["2020-06-01", "2020-06-02")
+ ```
- Note that in the last partition, the user only specifies the partition
value of the `date` column, so the system fills in `MIN_VALUE` as the partition
value of the `id` column by default. When data are imported, the system will
compare them with the partition values in order, and put the data in their
corresponding partitions. Examples are as follows:
+ On the next day, 2020-05-30, a new partition will be created `p20200602: ["
2020-06-02 "," 2020-06-03 ")`
- ``` text
- * Data --> Partition
- * 2017-01-01, 200 --> p201701_1000
- * 2017-01-01, 2000 --> p201701_1000
- * 2017-02-01, 100 --> p201701_1000
- * 2017-02-01, 2000 --> p201702_2000
- * 2017-02-15, 5000 --> p201702_2000
- * 2017-03-01, 2000 --> p201703_all
- * 2017-03-10, 1 --> p201703_all
- * 2017-04-01, 1000 --> Unable to import
- * 2017-05-01, 1000 --> Unable to import
- ```
+ On 2020-06-06, because `dynamic_partition.start` is set to 7, the partition
7 days ago will be deleted, that is, the partition `p20200529` will be deleted.
+2. Table tbl1 partition column k1, type is DATETIME, create a dynamic
partition rule. Partition by week, only keep the partition of the last 2 weeks,
and create the partition of the next 2 weeks in advance.
-<version since="1.2.0">
-
+ ```
+ CREATE TABLE tbl1
+ (
+ k1 DATETIME,
+ ...
+ )
+ PARTITION BY RANGE(k1) ()
+ DISTRIBUTED BY HASH(k1)
+ PROPERTIES
+ (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "WEEK",
+ "dynamic_partition.start" = "-2",
+ "dynamic_partition.end" = "2",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "8"
+ );
+ ```
-Range partitioning also supports batch partitioning. For example, you can
create multiple partitions that are divided by day at a time using the `FROM
("2022-01-03") TO ("2022-01-06") INTERVAL 1 DAY`: 2022-01-03 to 2022-01-06 (not
including 2022-01-06), the results will be as follows:
+ Suppose the current date is 2020-05-29, which is the 22nd week of 2020. The
default week starts on Monday. Based on the above rules, tbl1 will produce the
following partitions:
- p20220103: [2022-01-03, 2022-01-04)
- p20220104: [2022-01-04, 2022-01-05)
- p20220105: [2022-01-05, 2022-01-06)
+ ```
+ p2020_22: ["2020-05-25 00:00:00", "2020-06-01 00:00:00")
+ p2020_23: ["2020-06-01 00:00:00", "2020-06-08 00:00:00")
+ p2020_24: ["2020-06-08 00:00:00", "2020-06-15 00:00:00")
+ ```
-</version>
-
+ The start date of each partition is Monday of the week. At the same time,
because the type of the partition column k1 is DATETIME, the partition value
will fill the hour, minute and second fields, and all are 0.
-#### List Partitioning
+ On 2020-06-15, the 25th week, the partition 2 weeks ago will be deleted, ie
`p2020_22` will be deleted.
-* The partitioning columns support the `BOOLEAN, TINYINT, SMALLINT, INT,
BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR` data types, and the partition
values are enumeration values. Partitions can be only hit if the data is one of
the enumeration values in the target partition.
+ In the above example, suppose the user specified the start day of the week
as `"dynamic_partition.start_day_of_week" = "3"`, that is, set Wednesday as the
start of week. The partition is as follows:
-* List partitioning supports using `VALUES IN (...) ` to specify the
enumeration values contained in each partition.
+ ```
+ p2020_22: ["2020-05-27 00:00:00", "2020-06-03 00:00:00")
+ p2020_23: ["2020-06-03 00:00:00", "2020-06-10 00:00:00")
+ p2020_24: ["2020-06-10 00:00:00", "2020-06-17 00:00:00")
+ ```
-* The following example illustrates how partitions change when adding or
deleting a partition.
+ That is, the partition ranges from Wednesday of the current week to Tuesday
of the next week.
- * As in the `example_list_tbl` example above, when the table is created, the
following three partitions are automatically created.
+ :::tip
- ```text
- p_cn: ("Beijing", "Shanghai", "Hong Kong")
- p_usa: ("New York", "San Francisco")
- p_jp: ("Tokyo")
- ```
+ 2019-12-31 and 2020-01-01 are in same week, if the starting date of the
partition is 2019-12-31, the partition name is `p2019_53`, if the starting date
of the partition is 2020-01 -01, the partition name is `p2020_01`.
+ :::
- * If we add Partition p_uk VALUES IN ("London"), the results will be as
follows:
+3. Table tbl1 partition column k1, type is DATE, create a dynamic partition
rule. Partition by month without deleting historical partitions, and create
partitions for the next 2 months in advance. At the same time, set the starting
date on the 3rd of each month.
- ```text
- p_cn: ("Beijing", "Shanghai", "Hong Kong")
- p_usa: ("New York", "San Francisco")
- p_jp: ("Tokyo")
- p_uk: ("London")
- ```
+ ```
+ CREATE TABLE tbl1
+ (
+ k1 DATE,
+ ...
+ )
+ PARTITION BY RANGE(k1) ()
+ DISTRIBUTED BY HASH(k1)
+ PROPERTIES
+ (
+ "dynamic_partition.enable" = "true",
+ "dynamic_partition.time_unit" = "MONTH",
+ "dynamic_partition.end" = "2",
+ "dynamic_partition.prefix" = "p",
+ "dynamic_partition.buckets" = "8",
+ "dynamic_partition.start_day_of_month" = "3"
+ );
+ ```
- * Now we delete Partition p_jp, the results will be as follows:
+ Suppose the current date is 2020-05-29. Based on the above rules, tbl1 will
produce the following partitions:
- ```text
- p_cn: ("Beijing", "Shanghai", "Hong Kong")
- p_usa: ("New York", "San Francisco")
- p_uk: ("London")
- ```
+ ```
+ p202005: ["2020-05-03", "2020-06-03")
+ p202006: ["2020-06-03", "2020-07-03")
+ p202007: ["2020-07-03", "2020-08-03")
+ ```
- List partitioning also supports **multi-column partitioning**. Examples are
as follows:
+ Because `dynamic_partition.start` is not set, the historical partition will
not be deleted.
- ```text
- PARTITION BY LIST(`id`, `city`)
- (
- PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
- PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
- PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
- )
+ Assuming that today is 2020-05-20, and set 28th as the start of each month,
the partition range is:
+
+ ```
+ p202004: ["2020-04-28", "2020-05-28")
+ p202005: ["2020-05-28", "2020-06-28")
+ p202006: ["2020-06-28", "2020-07-28")
+ ```
+
+### Modify dynamic partition properties
+
+You can modify the properties of the dynamic partition with the following
command
+
+```
+ALTER TABLE tbl1 SET
+(
+ "dynamic_partition.prop1" = "value1",
+ ...
+);
+```
+
+The modification of certain attributes may cause conflicts. Assume that the
partition granularity was DAY and the following partitions have been created:
+
+```
+p20200519: ["2020-05-19", "2020-05-20")
+p20200520: ["2020-05-20", "2020-05-21")
+p20200521: ["2020-05-21", "2020-05-22")
+```
+
+If the partition granularity is changed to MONTH at this time, the system will
try to create a partition with the range `["2020-05-01", "2020-06-01")`, and
this range conflicts with the existing partition. So it cannot be created. And
the partition with the range `["2020-06-01", "2020-07-01")` can be created
normally. Therefore, the partition between 2020-05-22 and 2020-05-30 needs to
be filled manually.
+
+### Check dynamic partition table scheduling status
+
+You can further view the scheduling of dynamic partitioned tables by using the
following command:
+
+```
+mysql> SHOW DYNAMIC PARTITION TABLES;
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+| TableName | Enable | TimeUnit | Start | End | Prefix | Buckets |
StartOf | LastUpdateTime | LastSchedulerTime | State |
LastCreatePartitionMsg | LastDropPartitionMsg | ReservedHistoryPeriods |
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+| d3 | true | WEEK | -3 | 3 | p | 1 |
MONDAY | N/A | 2020-05-25 14:29:24 | NORMAL | N/A
| N/A | [2021-12-01,2021-12-31] |
+| d5 | true | DAY | -7 | 3 | p | 32 | N/A
| N/A | 2020-05-25 14:29:24 | NORMAL | N/A |
N/A | NULL |
+| d4 | true | WEEK | -3 | 3 | p | 1 |
WEDNESDAY | N/A | 2020-05-25 14:29:24 | NORMAL | N/A
| N/A | NULL |
+| d6 | true | MONTH | -2147483648 | 2 | p | 8 | 3rd
| N/A | 2020-05-25 14:29:24 | NORMAL | N/A |
N/A | NULL |
+| d2 | true | DAY | -3 | 3 | p | 32 | N/A
| N/A | 2020-05-25 14:29:24 | NORMAL | N/A |
N/A | NULL |
+| d7 | true | MONTH | -2147483648 | 5 | p | 8 | 24th
| N/A | 2020-05-25 14:29:24 | NORMAL | N/A |
N/A | NULL |
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+7 rows in set (0.02 sec)
+```
+
+- LastUpdateTime: The last time of modifying dynamic partition properties
+- LastSchedulerTime: The last time of performing dynamic partition scheduling
+- State: The state of the last execution of dynamic partition scheduling
+- LastCreatePartitionMsg: Error message of the last time to dynamically add
partition scheduling
+- LastDropPartitionMsg: Error message of the last execution of dynamic
deletion partition scheduling
+
+### Advanced operation
+
+**FE Configuration Item**
+
+- dynamic\_partition\_enable
+
+ Whether to enable Doris's dynamic partition feature. The default value is
false, which is off. This parameter only affects the partitioning operation of
dynamic partition tables, not normal tables. You can modify the parameters in
`fe.conf` and restart FE to take effect. You can also execute the following
commands at runtime to take effect:
+
+ ```
+ MySQL protocol:
+
+ `ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true")`
+
+ HTTP protocol:
+
+ `curl --location-trusted -u username:password -XGET
http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true`
```
- In the above example, we specify `id` (INT type) and `city` (VARCHAR type)
as the partitioning columns, so the resulting partitions will be as follows:
+ To turn off dynamic partitioning globally, set this parameter to false.
+
+- dynamic\_partition\_check\_interval\_seconds
+
+ The execution frequency of dynamic partition threads defaults to 3600 (1
hour), that is, scheduling is performed every 1 hour. You can modify the
parameters in `fe.conf` and restart FE to take effect. You can also modify the
following commands at runtime:
- ```text
- * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
- * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
- * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
```
+ MySQL protocol:
+
+ `ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" =
"7200")`
- When data are imported, the system will compare them with the partition
values in order, and put the data in their corresponding partitions. Examples
are as follows:
- ```text
- Data ---> Partition
- 1, Beijing ---> p1_city
- 1, Shanghai ---> p1_city
- 2, Shanghai ---> p2_city
- 3, Beijing ---> p3_city
- 1, Tianjin ---> Unable to import
- 4, Beijing ---> Unable to import
+ HTTP protocol:
+
+ `curl --location-trusted -u username:password -XGET
http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000`
```
-2. Bucketing
+**Converting dynamic and manual partition tables to each other**
+
+For a table, dynamic and manual partitioning can be freely converted, but they
cannot exist at the same time, there is and only one state.
- * If you use the Partition method, the `DISTRIBUTED ...` statement will
describe how data are divided among partitions. If you do not use the Partition
method, that statement will describe how data of the whole table are divided.
- * You can specify multiple columns as the bucketing columns. In Aggregate
and Unique Models, bucketing columns must be Key columns; in the Duplicate
Model, bucketing columns can be Key columns and Value columns. Bucketing
columns can either be partitioning columns or not.
- * The choice of bucketing columns is a trade-off between **query
throughput** and **query concurrency**:
+**Converting Manual Partitioning to Dynamic Partitioning**
- 1. If you choose to specify multiple bucketing columns, the data will be
more evenly distributed. However, if the query condition does not include the
equivalent conditions for all bucketing columns, the system will scan all
buckets, largely increasing the query throughput and decreasing the latency of
a single query. This method is suitable for high-throughput, low-concurrency
query scenarios.
- 2. If you choose to specify only one or a few bucketing columns, point
queries might scan only one bucket. Thus, when multiple point queries are
preformed concurrently, they might scan various buckets, with no interaction
between the IO operations (especially when the buckets are stored on various
disks). This approach is suitable for high-concurrency point query scenarios.
+If a table is not dynamically partitioned when it is created, it can be
converted to dynamic partitioning at runtime by modifying the dynamic
partitioning properties with `ALTER TABLE`, an example of which can be seen
with `HELP ALTER TABLE`.
- * AutoBucket: Calculates the number of partition buckets based on the
amount of data. For partitioned tables, you can determine a bucket based on the
amount of data, the number of machines, and the number of disks in the
historical partition.
- * There is no theoretical limit on the number of buckets.
+When dynamic partitioning feature is enabled, Doris will no longer allow users
to manage partitions manually, but will automatically manage partitions based
on dynamic partition properties.
-3. Recommendations on the number and data volume for Partitions and Buckets.
+:::tip
- * The total number of tablets in a table is equal to (Partition num *
Bucket num).
- * The recommended number of tablets in a table, regardless of capacity
expansion, is slightly more than the number of disks in the entire cluster.
- * The data volume of a single tablet does not have an upper or lower limit
theoretically, but is recommended to be in the range of 1G - 10G. Overly small
data volume of a single tablet can impose a stress on data aggregation and
metadata management; while overly large data volume can cause trouble in data
migration and completion, and increase the cost of Schema Change or Rollup
operation failures (These operations are performed on the Tablet level).
- * For the tablets, if you cannot have the ideal data volume and the ideal
quantity at the same time, it is recommended to prioritize the ideal data
volume.
- * Upon table creation, you specify the same number of Buckets for each
Partition. However, when dynamically increasing partitions (`ADD PARTITION`),
you can specify the number of Buckets for the new partitions separately. This
feature can help you cope with data reduction or expansion.
- * Once you have specified the number of Buckets for a Partition, you may
not change it afterwards. Therefore, when determining the number of Buckets,
you need to consider the need of cluster expansion in advance. For example, if
there are only 3 hosts, and each host has only 1 disk, and you have set the
number of Buckets is only set to 3 or less, then no amount of newly added
machines can increase concurrency.
- * For example, suppose that there are 10 BEs and each BE has one disk, if
the total size of a table is 500MB, you can consider dividing it into 4-8
tablets; 5GB: 8-16 tablets; 50GB: 32 tablets; 500GB: you may consider dividing
it into partitions, with each partition about 50GB in size, and 16-32 tablets
per partition; 5TB: divided into partitions of around 50GB and 16-32 tablets
per partition.
+If `dynamic_partition.start` is set, historical partitions with a partition
range before the start offset of the dynamic partition will be deleted.
- > Note: You can check the data volume of the table using the [show
data](../sql-manual/sql-reference/Show-Statements/SHOW-DATA.md) command. Divide
the returned result by the number of copies, and you will know the data volume
of the table.
+:::
-4. About the settings and usage scenarios of Random Distribution:
+**Converting Dynamic Partitioning to Manual Partitioning**
- * If the OLAP table does not have columns of REPLACE type, set the data
bucketing mode of the table to RANDOM. This can avoid severe data skew. (When
loading data into the partition corresponding to the table, each batch of data
in a single load task will be written into a randomly selected tablet).
- * When the bucketing mode of the table is set to RANDOM, since there are no
specified bucketing columns, it is impossible to query only a few buckets, so
all buckets in the hit partition will be scanned when querying the table. Thus,
this setting is only suitable for aggregate query analysis of the table data as
a whole, but not for highly concurrent point queries.
- * If the data distribution of the OLAP table is Random Distribution, you
can set `load to single tablet` to true when importing data. In this way, when
importing large amounts of data, in one task, data will be only written in one
tablet of the corresponding partition. This can improve both the concurrency
and throughput of data import and reduce write amplification caused by data
import and compaction, and thus, ensure cluster stability.
+The dynamic partitioning feature can be disabled by executing `ALTER TABLE
tbl_name SET ("dynamic_partition.enable" = "false") ` and converting it to a
manual partition table.
-#### Compound Partitioning vs Single Partitioning
+When dynamic partitioning feature is disabled, Doris will no longer manage
partitions automatically, and users will have to create or delete partitions
manually by using `ALTER TABLE`.
-Compound Partitioning
+## Auto partition
-- The first layer of data partitioning is called Partition. Users can specify
a dimension column as the partitioning column (currently only supports columns
of INT and TIME types), and specify the value range of each partition.
-- The second layer is called Distribution, which means bucketing. Users can
perform HASH distribution on data by specifying the number of buckets and one
or more dimension columns as the bucketing columns, or perform random
distribution on data by setting the mode to Random Distribution.
+:::tip
-Compound partitioning is recommended for the following scenarios:
+Doris version 2.1 starts to support automatic partitioning. To use this
feature, please [download Doris 2.1](https://doris.apache.org/zh-CN/download)
and refer to the documentation for version 2.1.
-- Scenarios with time dimensions or similar dimensions with ordered values,
which can be used as partitioning columns. The partitioning granularity can be
evaluated based on data import frequency, data volume, etc.
-- Scenarios with a need to delete historical data: If, for example, you only
need to keep the data of the last N days), you can use compound partitioning so
you can delete historical partitions. To remove historical data, you can also
send a DELETE statement within the specified partition.
-- Scenarios with a need to avoid data skew: you can specify the number of
buckets individually for each partition. For example, if you choose to
partition the data by day, and the data volume per day varies greatly, you can
customize the number of buckets for each partition. For the choice of bucketing
column, it is recommended to select the column(s) with variety in values.
+:::
-Users can also choose for single partitioning, which is about HASH
distribution.
+The Auto Partitioning feature supports automatic detection of whether the
corresponding partition exists during the data import process. If it does not
exist, the partition will be created automatically and imported normally.
-### PROPERTIES
+The auto partition function mainly solves the problem that the user expects to
partition the table based on a certain column, but the data distribution of the
column is scattered or unpredictable, so it is difficult to accurately create
the required partitions when building or adjusting the structure of the table,
or the number of partitions is so large that it is too cumbersome to create
them manually.
-In the `PROPERTIES` section at the last of the CREATE TABLE statement, you can
set the relevant parameters. Please see [CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
for a detailed introduction.
+Take the time type partition column as an example, in the Dynamic Partition
function, we support the automatic creation of new partitions to accommodate
real-time data at specific time periods. For real-time user behavior logs and
other scenarios, this feature basically meets the requirements. However, in
more complex scenarios, such as dealing with non-real-time data, the partition
column is independent of the current system time and contains a large number of
discrete values. At this t [...]
-### ENGINE
+Suppose our table DDL is as follows:
+
+```sql
+CREATE TABLE `DAILY_TRADE_VALUE`
+(
+ `TRADE_DATE` datev2 NOT NULL COMMENT '交易日期',
+ `TRADE_ID` varchar(40) NOT NULL COMMENT '交易编号',
+ ......
+)
+UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`)
+PARTITION BY RANGE(`TRADE_DATE`)
+(
+ PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')),
+ PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')),
+ PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')),
+ PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')),
+ PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')),
+ PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')),
+ PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')),
+ PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')),
+ PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')),
+ PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')),
+ PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')),
+ PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')),
+ PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')),
+ PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')),
+ PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')),
+ PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')),
+ PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')),
+ PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')),
+ PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')),
+ PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')),
+ PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')),
+ PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01'))
+)
+DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
+PROPERTIES (
+ "replication_num" = "1"
+);
+```
+
+The table stores a large amount of business history data, partitioned based on
the date the transaction occurred. As you can see when building the table, we
need to manually create the partitions in advance. If the data range of the
partitioned columns changes, for example, 2022 is added to the above table, we
need to create a partition by
[ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)
to make changes to the table partition. [...]
+
+## Manual bucketing
+
+If partitions are used, `DISTRIBUTED ..`. statement describes the rules for
dividing data within each partition.
+
+If partitions are not used, it describes the rules for dividing the data
across the entire table.
+
+It is also possible to specify a bucketing method for each partition
individually.
+
+The bucket columns can be multiple columns. For the Aggregate and Unique
models, they must be Key columns, while for the duplicate key data model, they
can be both key and value columns. Bucket columns can be the same as or
different from Partition columns.
+
+The choice of bucket columns involves a trade-off between query throughput and
query concurrency:
+
+- If multiple bucket columns are selected, the data distribution will be more
uniform. If a query condition does not include equal conditions for all bucket
columns, the query will trigger simultaneous scanning of all buckets,
increasing query throughput and reducing the latency of individual queries.
This approach is suitable for high-throughput, low-concurrency query scenarios.
+- If only one or a few bucket columns are selected, a point query can trigger
scanning of just one bucket. In this case, when multiple point queries are
concurrent, there is a higher probability that they will trigger scanning of
different buckets, reducing the IO impact between queries (especially when
different buckets are distributed across different disks). Therefore, this
approach is suitable for high-concurrency point query scenarios.
+
+### Recommendations for bucket number and data volume:
+
+- The total number of tablets for a table is equal to (Partition num * Bucket
num).
+- Without considering expansion, it is recommended that the number of tablets
for a table be slightly more than the total number of disks in the cluster.
+- In theory, there is no upper or lower limit for the data volume of a single
tablet, but it is recommended to be within the range of 1G - 10G. If the data
volume of a single tablet is too small, the data aggregation effect will not be
good, and the metadata management pressure will be high. If the data volume is
too large, it will not be conducive to the migration and replenishment of
replicas, and it will increase the cost of retrying failed operations such as
Schema Change or Rollup ( [...]
+- When there is a conflict between the data volume principle and the quantity
principle of tablets, it is recommended to prioritize the data volume principle.
+- When creating a table, the bucket number for each partition is uniformly
specified. However, when dynamically adding partitions `ADD PARTITION`, the
bucket number for the new partition can be specified separately. This feature
can be conveniently used to handle data reduction or expansion.
+- Once the bucket number for a partition is specified, it cannot be changed.
Therefore, when determining the bucket number, it is necessary to consider the
cluster expansion scenario in advance. For example, if there are only 3 hosts
with 1 disk each, and the bucket number is set to 3 or less, then even if more
machines are added later, the concurrency cannot be improved.
+
+Here are some examples: Assuming there are 10 BEs, each with one disk. If a
table has a total size of 500MB, 4-8 tablets can be considered. For 5GB: 8-16
tablets. For 50GB: 32 tablets. For 500GB: It is recommended to partition the
table, with each partition size around 50GB and 16-32 tablets per partition.
For 5TB: It is recommended to partition the table, with each partition size
around 50GB and 16-32 tablets per partition.
+
+The data volume of a table can be viewed using the [SHOW
DATA](../sql-manual/sql-reference/Show-Statements/SHOW-DATA) command, and the
result should be divided by the number of replicas to obtain the actual data
volume of the table.
+
+### Random distribution
+
+- If an OLAP table does not have fields of the update type, setting the data
bucketing mode of the table to RANDOM can avoid severe data skew. When data is
imported into the corresponding partitions of the table, each batch of a single
import job will randomly select a tablet for writing.
+- When the bucketing mode of a table is set to RANDOM, there is no bucketing
column, it is not possible to query only a few buckets based on the values of
the bucketing column. Queries on the table will simultaneously scan all buckets
that hit the partition. This setting is suitable for aggregate query analysis
of the entire table data, but not suitable for high-concurrency point queries.
+- If the data distribution of the OLAP table is Random Distribution, then
during data import, single-tablet import mode can be set (set
`load_to_single_tablet` to true). Then, during large-volume data import, a task
will only write to one tablet when writing data to the corresponding partition.
This can improve the concurrency and throughput of data import, reduce the
write amplification caused by data import and compaction, and ensure the
stability of the cluster.
+
+## Auto bucket
+
+Users often encounter various issues due to improper bucket settings. To
address this, we provide an automated approach for setting the number of
buckets, which is currently applicable only to OLAP tables.
+
+:::tip
+
+This feature will be disabled when synchronized by CCR. If this table is
copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will
be displayed as enabled in show create table, but will not actually take
effect. When `is_being_synced` is set to `false`, these features will resume
working, but the `is_being_synced` property is for CCR peripheral modules only
and should not be manually set during CCR synchronization.
+
+:::
+
+In the past, user had to set the number of buckets manually when creating
table, but the automatic bucket feature is a way for Apache Doris to
dynamically project the number of buckets, so that the number of buckets always
stays within a suitable range and users don't have to worry about the minutiae
of the number of buckets.
+
+For the sake of clarity, this section splits the bucket into two periods, the
initial bucket and the subsequent bucket; the initial and subsequent are just
terms used in this article to describe the feature clearly, there is no initial
or subsequent Apache Doris bucket.
+
+As we know from the section above on creating buckets, `BUCKET_DESC` is very
simple, but you need to specify the number of buckets; for the automatic bucket
projection feature, the syntax of BUCKET_DESC directly changes the number of
buckets to `Auto` and adds a new Properties configuration.
+
+```sql
+-- old version of the creation syntax for specifying the number of buckets
+DISTRIBUTED BY HASH(site) BUCKETS 20
+
+-- Newer versions use the creation syntax for automatic bucket imputation
+DISTRIBUTED BY HASH(site) BUCKETS AUTO
+properties("estimate_partition_size" = "100G")
+```
+
+The new configuration parameter estimate_partition_size indicates the amount
of data for a single partition. This parameter is optional and if not given,
Doris will take the default value of estimate_partition_size to 10GB.
+
+As you know from the above, a partitioned bucket is a Tablet at the physical
level, and for best performance, it is recommended that the Tablet size be in
the range of 1GB - 10GB. So how does the automatic bucketing projection ensure
that the Tablet size is within this range?
+
+To summarize, there are a few principles.
+
+- If the overall data volume is small, the number of buckets should not be set
too high
+- If the overall data volume is large, the number of buckets should be related
to the total number of disk blocks, so as to fully utilize the capacity of each
BE machine and each disk
+
+### Initial bucketing projection
+
+1. Obtain a number of buckets N based on the data size. Initially, we divide
the value of `estimate_partition_size` by 5 (considering a data compression
ratio of 5 to 1 when storing data in text format in Doris). The result obtained
is
+
+```
+(, 100MB), then take N=1
+
+[100MB, 1GB), then take N=2
+
+(1GB, ), then one bucket per GB
+```
+
+2. calculate the number of buckets M based on the number of BE nodes and the
disk capacity of each BE node.
+
+```
+Where each BE node counts as 1, and every 50G of disk capacity counts as 1.
+
+The calculation rule for M is: M = Number of BE nodes * (Size of one disk
block / 50GB) * Number of disk blocks.
+
+For example: If there are 3 BEs, and each BE has 4 disks of 500GB, then M = 3
* (500GB / 50GB) * 4 = 120.
+
+```
+
+3. Calculation logic to get the final number of buckets.
+
+```
+Calculate an intermediate value x = min(M, N, 128).
+
+If x < N and x < the number of BE nodes, the final bucket is y.
+
+The number of BE nodes; otherwise, the final bucket is x.
+```
+
+4. x = max(x, autobucket_min_buckets), Here autobucket_min_buckets is
configured in Config (where, default is 1)
+
+The pseudo-code representation of the above process is as follows
+
+```
+int N = Compute the N value;
+int M = compute M value;
+
+int y = number of BE nodes;
+int x = min(M, N, 128);
+
+if (x < N && x < y) {
+ return y;
+}
+return x;
+```
+
+With the above algorithm in mind, let's introduce some examples to better
understand this part of the logic.
+
+```
+case1:
+Amount of data 100 MB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 1
+BE disks M = 10* (2TB/50GB) * 3 = 1230
+x = min(M, N, 128) = 1
+Final: 1
+
+case2:
+Data volume 1GB, 3 BE machines, 500GB * 2 disks
+Amount of data N = 2
+BE disks M = 3* (500GB/50GB) * 2 = 60
+x = min(M, N, 128) = 2
+Final: 2
+
+case3:
+Data volume 100GB, 3 BE machines, 500GB * 2 disks
+Amount of data N = 20
+BE disks M = 3* (500GB/50GB) * 2 = 60
+x = min(M, N, 128) = 20
+Final: 20
+
+case4:
+Data volume 500GB, 3 BE machines, 1TB * 1 disk
+Data volume N = 100
+BE disks M = 3* (1TB /50GB) * 1 = 60
+x = min(M, N, 128) = 63
+Final: 63
+
+case5:
+Data volume 500GB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 100
+BE disks M = 10* (2TB / 50GB) * 3 = 1230
+x = min(M, N, 128) = 100
+Final: 100
+
+case 6:
+Data volume 1TB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 205
+BE disks M = 10* (2TB / 50GB) * 3 = 1230
+x = min(M, N, 128) = 128
+Final: 128
+
+case 7:
+Data volume 500GB, 1 BE machine, 100TB * 1 disk
+Amount of data N = 100
+BE disk M = 1* (100TB / 50GB) * 1 = 2048
+x = min(M, N, 128) = 100
+Final: 100
+
+case 8:
+Data volume 1TB, 200 BE machines, 4TB * 7 disks
+Amount of data N = 205
+BE disks M = 200* (4TB / 50GB) * 7 = 114800
+x = min(M, N, 128) = 128
+Final: 200
+```
+
+### Subsequent bucketing projection
+
+The above is the calculation logic for the initial bucketing. The subsequent
bucketing can be evaluated based on the amount of partition data available
since there is already a certain amount of partition data. The subsequent
bucket size is evaluated based on the EMA[1] (short term exponential moving
average) value of up to the first 7 partitions, which is used as the
estimate_partition_size. At this point there are two ways to calculate the
partition buckets, assuming partitioning by da [...]
+
+- If the partition data in 7 days is strictly increasing daily, then the trend
value will be taken at this time. There are 6 delta values, which are
+
+```
+S7 - S6 = delta1,
+S6 - S5 = delta2,
+...
+S2 - S1 = delta6
+```
-In this example, the ENGINE is of OLAP type, which is the default ENGINE type.
In Doris, only the OALP ENGINE type is managed and stored by Doris. Other
ENGINE types, such as MySQL, Broker, ES, are essentially mappings to tables in
other external databases or systems to ensure that Doris can read the data. And
Doris itself does not create, manage, or store any tables and data of non-OLAP
ENGINE type.
+This yields the ema(delta) value.Then, today's estimate_partition_size = S7 +
ema(delta)
-### Other
+- not the first case, this time directly take the average of the previous days
EMA. Today's estimate_partition_size = EMA(S1, ... , S7) , S7)
-`IF NOT EXISTS` means to create the table if it is non-existent. Note that the
system only checks the existence of table based on the table name, but not
compare the schema of the newly created table with the existing ones. So if
there exists a table of the same name but different schema, the command will
also return, but it does not mean that a new table of a new schema has been
created.
+:::tip
-## FAQ
+According to the above algorithm, the initial number of buckets and the number
of subsequent buckets can be calculated. Unlike before when only a fixed number
of buckets could be specified, due to changes in business data, it is possible
that the number of buckets in the previous partition is different from the
number of buckets in the next partition, which is transparent to the user, and
the user does not need to care about the exact number of buckets in each
partition, and this automat [...]
-### Table Creation
+:::
-1. If a syntax error occurs in a long CREATE TABLE statement, the error
message may be incomplete. Here is a list of possible syntax errors for your
reference in manual touble shooting:
+## Common Issues
- * Incorrect syntax. Please use `HELP CREATE TABLE;`to check the relevant
syntax.
- * Reserved words. Reserved words in user-defined names should be enclosed
in backquotes ``. It is recommended that all user-defined names be enclosed in
backquotes.
- * Chinese characters or full-width characters. Non-UTF8 encoded Chinese
characters, or hidden full-width characters (spaces, punctuation, etc.) can
cause syntax errors. It is recommended that you check for these characters
using a text editor that can display non-printable characters.
+1. Incomplete syntax error prompts may occur in longer table creation
statements. Here are some possible syntax errors for manual troubleshooting:
-2. `Failed to create partition [xxx] . Timeout`
+ - Syntax structure errors. Please carefully read [HELP CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE)
and check the relevant syntax structure.
+ - Reserved words. When user-defined names encounter reserved words, they
need to be enclosed in backticks ``. It is recommended to use this symbol for
all custom names.
+ - Chinese characters or full-width characters. Non-UTF8 encoded Chinese
characters or hidden full-width characters (spaces, punctuation, etc.) can
cause syntax errors. It is recommended to use a text editor that displays
invisible characters for inspection.
- In Doris, tables are created in the order of the partitioning granularity.
This error prompt may appear when a partition creation task fails, but it could
also appear in table creation tasks with no partitioning operations, because,
as mentioned earlier, Doris will create an unmodifiable default partition for
tables with no partitions specified.
+2. Failed to create partition [xxx]. Timeout
- This error usually pops up because the tablet creation goes wrong in BE.
You can follow the steps below for troubleshooting:
+ Doris creates tables sequentially based on partition granularity. When a
partition fails to create, this error may occur. Even if partitions are not
used, when there is a problem with table creation, `Failed to create partition`
may still be reported because, as mentioned earlier, Doris creates an
unmodifiable default partition for tables without specified partitions.
- 1. In fe.log, find the `Failed to create partition` log of the
corresponding time point. In that log, find a number pair that looks like
`{10001-10010}` . The first number of the pair is the Backend ID and the second
number is the Tablet ID. As for `{10001-10010}`, it means that on Backend ID
10001, the creation of Tablet ID 10010 failed.
- 2. After finding the target Backend, go to the corresponding be.INFO log
and find the log of the target tablet, and then check the error message.
- 3. A few common tablet creation failures include but not limited to:
- * The task is not received by BE. In this case, the tablet ID related
information will be found in be.INFO, or the creation is successful in BE but
it still reports a failure. To solve the above problems, see [Installation and
Deployment](https://doris.apache.org/docs/dev/install/standard-deployment/)
about how to check the connectivity of FE and BE.
- * Pre-allocated memory failure. It may be that the length of a row in
the table exceeds 100KB.
- * `Too many open files`. The number of open file descriptors exceeds the
Linux system limit. In this case, you need to change the open file descriptor
limit of the Linux system.
+ When encountering this error, it is usually because the BE encountered a
problem when creating data tablets. You can troubleshoot by following these
steps:
- If it is a timeout error, you can set `tablet_create_timeout_second=xxx`
and `max_create_table_timeout_second=xxx` in fe.conf. The default value of
`tablet_create_timeout_second=xxx` is 1 second, and that of
`max_create_table_timeout_second=xxx` is 60 seconds. The overall timeout would
be min(tablet_create_timeout_second * replication_num,
max_create_table_timeout_second). For detailed parameter settings, please check
[FE Configuration](https://doris.apache.org/docs/dev/admin-manual/ [...]
+ - In the fe.log, search for the `Failed to create partition` log entry at
the corresponding timestamp. In this log entry, you may find a series of number
pairs similar to `{10001-10010}`. The first number in the pair represents the
Backend ID, and the second number represents the Tablet ID. For example, this
number pair indicates that the creation of Tablet ID 10010 on Backend ID 10001
failed.
+ - Go to the be.INFO log of the corresponding Backend and search for tablet
ID-related logs within the corresponding time period to find error messages.
+ - Here are some common tablet creation failure errors, including but not
limited to:
+ - The BE did not receive the relevant task. In this case, you cannot find
tablet ID-related logs in be.INFO or the BE reports success but actually fails.
For these issues, please refer to the [Installation and
Deployment](../install/cluster-deployment/standard-deployment) section to check
the connectivity between FE and BE.
+ - Pre-allocated memory failure. This may be because the byte length of a
row in the table exceeds 100KB.
+ - `Too many open files`. The number of open file handles exceeds the
Linux system limit. You need to modify the handle limit of the Linux system.
-3. The build table command does not return results for a long time.
+* If there is a timeout when creating data tablets, you can also extend the
timeout by setting `tablet_create_timeout_second=xxx` and
`max_create_table_timeout_second=xxx` in the fe.conf file. By default,
`tablet_create_timeout_second` is set to 1 second, and
`max_create_table_timeout_second` is set to 60 seconds. The overall timeout is
calculated as `min(tablet_create_timeout_second * replication_num,
max_create_table_timeout_second)`. For specific parameter settings, please
refer to th [...]
- Doris's table creation command is a synchronous command. The timeout of
this command is currently set to be relatively simple, ie (tablet num *
replication num) seconds. If you create more data fragments and have fragment
creation failed, it may cause an error to be returned after waiting for a long
timeout.
+3. The table creation command does not return results for a long time.
- Under normal circumstances, the statement will return in a few seconds or
ten seconds. If it is more than one minute, it is recommended to cancel this
operation directly and go to the FE or BE log to view the related errors.
+* Doris's table creation command is a synchronous command. The timeout for
this command is currently set simply as (tablet num * replication num) seconds.
If many data tablets are created and some of them fail to create, it may result
in a long wait before returning an error.
+* Under normal circumstances, the table creation statement should return
within a few seconds or tens of seconds. If it exceeds one minute, it is
recommended to cancel the operation directly and check the relevant errors in
the FE or BE logs.
-## More Help
+## Additional Help
-For more detailed instructions on data partitioning, please refer to the
[CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
command manual, or enter `HELP CREATE TABLE;` in MySQL Client.
+For more detailed information on data partitioning, you can refer to the
[CREATE
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE)
command manual or enter `HELP CREATE TABLE;` in the MySQL client to get more
help information.
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]