(doris-website) branch master updated: [doc]update en docs of data partition (#475)

luzhijing Tue, 26 Mar 2024 20:29:20 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new db10894c567a [doc]update en docs of data partition (#475)
db10894c567a is described below

commit db10894c567a05a52daabee02b0965cdf984db12
Author: Luzhijing <[email protected]>
AuthorDate: Wed Mar 27 11:28:47 2024 +0800

    [doc]update en docs of data partition (#475)
---
 .../version-2.0/table-design/data-partition.md     |    6 +-
 .../version-2.0/table-design/data-partition.md     | 1125 +++++++++++++++-----
 2 files changed, 878 insertions(+), 253 deletions(-)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
index d58028d40617..774b1beb5e7f 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/table-design/data-partition.md
@@ -402,7 +402,7 @@ PARTITION BY LIST(id, city)
 
 -  `dynamic_partition.start_day_of_month`
 
-  当 `time_unit` 为 `MONTH` 时，该参数用于指定每月的起始日期。取值为 1 到 28。其中 1 表示每月 1 号，28 表示每月 28 
号。默认为 1，即表示每月以 1 号位起始点。暂不支持以 29、30、31 号为起始日，以避免因闰年或闰月带来的歧义。
+  当 `time_unit` 为 `MONTH` 时，该参数用于指定每月的起始日期。取值为 1 到 28。其中 1 表示每月 1 号，28 表示每月 28 
号。默认为 1，即表示每月以 1 号为起始点。暂不支持以 29、30、31 号为起始日，以避免因闰年或闰月带来的歧义。
 
 - `dynamic_partition.create_history_partition`
 
@@ -474,7 +474,7 @@ PARTITION BY LIST(id, city)
 假设需要创建的历史分区数量为 `expect_create_partition_num`，根据不同的设置具体数量如下：
 
 - create_history_partition = true
-  
+
   dynamic_partition.history_partition_num 未设置，即 -1. 
expect_create_partition_num = end - start;
 
   dynamic_partition.history_partition_num 已设置 expect_create_partition_num = 
end - max(start, -histoty_partition_num);
@@ -578,7 +578,7 @@ PROPERTIES
 );
 ```
 
-假设当前日期为 2020-05-29，是 2020 年的第 22 周。默认每周起始为星期一。则根于以上规则，tbl1 会产生以下分区：
+假设当前日期为 2020-05-29，是 2020 年的第 22 周。默认每周起始为星期一。则以上规则，tbl1 会产生以下分区：
 
 ```Plain
 p2020_22: ["2020-05-25 00:00:00", "2020-06-01 00:00:00")
diff --git a/versioned_docs/version-2.0/table-design/data-partition.md 
b/versioned_docs/version-2.0/table-design/data-partition.md
index 661bf7cbe9b6..46d400e2f121 100644
--- a/versioned_docs/version-2.0/table-design/data-partition.md
+++ b/versioned_docs/version-2.0/table-design/data-partition.md
@@ -24,44 +24,47 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Data Partition
+This document mainly introduces table creation and data partitioning in Doris, 
as well as potential problems and solutions encountered during table creation 
operations.
 
-This topic is about table creation and data partitioning in Doris, including 
the common problems in table creation and their solutions.
+## Basic concepts
 
-## Basic Concepts
-
-In Doris, data is logically described in the form of table.
+In Doris, data is logically described in the form of tables.
 
 ### Row & Column
 
-A table contains rows and columns. 
+A table consists of rows and columns:
 
-Row refers to a row of user data. Column is used to describe different fields 
in a row of data.
+- Row: Represents a single line of user data;
+- Column: Used to describe different fields in a row of data;
+- Columns can be divided into two types: Key and Value. From a business 
perspective, Key and Value can correspond to dimension columns and metric 
columns, respectively. The key columns in Doris are those specified in the 
table creation statement, which are the columns following the keywords `unique 
key`, `aggregate key`, or `duplicate key`. The remaining columns are value 
columns. From the perspective of the aggregation model, rows with the same Key 
columns will be aggregated into a sing [...]
 
-Columns can be divided into two categories: Key and Value. From a business 
perspective, Key and Value correspond to dimension columns and metric columns, 
respectively. The key column of Doris is the column specified in the table 
creation statement. The column after the keyword 'unique key' or 'aggregate 
key' or 'duplicate key' in the table creation statement is the key column, and 
the rest except the key column is the value column. In the Aggregate Model, 
rows with the same values in Key [...]
+### Partition & Tablet
 
-### Tablet & Partition
+Doris supports two levels of data partitioning. The first level is 
Partitioning, which supports Range and List partition. The second level is 
Bucket (also known as Tablet), which supports Hash and Random . If no 
partitioning is established during table creation, Doris generates a default 
partition that is transparent to the user. When using the default partition, 
only Bucket is supported.
 
-In the Doris storage engine, user data are horizontally divided into data 
tablets (also known as data buckets). Each tablet contains several rows of 
data. The data between the individual tablets do not intersect and is 
physically stored independently.
+In the Doris storage engine, data is horizontally partitioned into several 
tablets. Each tablet contains several rows of data. There is no overlap between 
the data in different tablets, and they are stored physically independently.
 
-Tablets are logically attributed to different Partitions. One Tablet belongs 
to only one Partition, and one Partition contains several Tablets. Since the 
tablets are physically stored independently, the partitions can be seen as 
physically independent, too. Tablet is the smallest physical storage unit for 
data operations such as movement and replication.
+Multiple tablets logically belong to different partitions. A single tablet 
belongs to only one partition, while a partition contains several tablets. 
Because tablets are stored physically independently, partitions can also be 
considered physically independent. The tablet is the smallest physical storage 
unit for operations such as data movement and replication.
 
-A Table is formed of multiple Partitions. Partition can be thought of as the 
smallest logical unit of management. Data import and deletion can be performed 
on only one Partition.
+Several partitions compose a table. The partition can be considered the 
smallest logical management unit.
 
-## Data Partitioning
+Benefits of Two-Level data partitioning:
 
-The following illustrates on data partitioning in Doris using the example of a 
CREATE TABLE operation.
+- For dimensions with time or similar ordered values, such dimension columns 
can be used as partitioning columns. The partition granularity can be evaluated 
based on import frequency and partition data volume.
+- Historical data deletion requirements: If there is a need to delete 
historical data (such as retaining only the data for the most recent several 
days), composite partition can be used to achieve this goal by deleting 
historical partitions. Alternatively, DELETE statements can be sent within 
specified partitions to delete data.
+- Solving data skew issues: Each partition can specify the number of buckets 
independently. For example, when partitioning by day and there are significant 
differences in data volume between days, the number of buckets for each 
partition can be specified to reasonably distribute data across different 
partitions. It is recommended to choose a column with high distinctiveness as 
the bucketing column.
 
-CREATE TABLE in Doris is a synchronous command. It returns results after the 
SQL execution is completed. Successful returns indicate successful table 
creation. For more information on the syntax, please refer to [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md),
 or input  the `HELP CREATE TABLE;` command. 
+### Example of creating a table 
 
-This section introduces how to create tables in Doris.
+CREATE TABLE in Doris is a synchronous command. It returns results after the 
SQL execution is completed. Successful returns indicate successful table 
creation. For more information, please refer to [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE),
 or input  the `HELP CREATE TABLE;` command. 
 
-```sql
--- Range Partition
+This section introduces how to create tables in Doris by range partiton and 
hash buckets.
 
-CREATE TABLE IF NOT EXISTS example_db.example_range_tbl
+```
+-- Range Partition
+CREATE TABLE IF NOT EXISTS example_range_tbl
 (
-    `user_id` LARGEINT NOT NULL COMMENT "User ID",
+     `user_id` LARGEINT NOT NULL COMMENT "User ID",
     `date` DATE NOT NULL COMMENT "Date when the data are imported",
     `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are 
imported",
     `city` VARCHAR(20) COMMENT "User location city",
@@ -70,355 +73,977 @@ CREATE TABLE IF NOT EXISTS example_db.example_range_tbl
     `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT 
"User last visit time",
     `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption",
     `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
-    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
+    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time" 
  
 )
-ENGINE=olap
+ENGINE=OLAP
 AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
 PARTITION BY RANGE(`date`)
 (
-    PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
-    PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
-    PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
-    PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
+    PARTITION `p201701` VALUES [("2017-01-01"),  ("2017-02-01")),
+    PARTITION `p201702` VALUES [("2017-02-01"), ("2017-03-01")),
+    PARTITION `p201703` VALUES [("2017-03-01"), ("2017-04-01"))
 )
 DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
 PROPERTIES
 (
-    "replication_num" = "3",
-    "storage_medium" = "SSD",
-    "storage_cooldown_time" = "2018-01-01 12:00:00"
+    "replication_num" = "1"
 );
+```
+
+Here use the AGGREGATE KEY data model as an example. In the AGGREGATE KEY data 
model, all columns that are specified with an aggregation type (SUM, REPLACE, 
MAX, or MIN) are Value columns. The rest are the Key columns.
+
+In the PROPERTIES at the end of the CREATE TABLE statement, you can find 
detailed information about the relevant parameters that can be set in 
PROPERTIES by referring to the documentation on [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE).
+
+The default type of ENGINE is OLAP. In Doris, only this OLAP ENGINE type is 
responsible for data management and storage by Doris itself. Other ENGINE 
types, such as mysql, broker, es, etc., are essentially just mappings to tables 
in other external databases or systems, allowing Doris to read this data. 
However, Doris itself does not create, manage, or store any tables or data for 
non-OLAP ENGINE types.
+
+`IF NOT EXISTS` indicates that if the table has not been created before, it 
will be created. Note that this only checks if the table name exists and does 
not check if the schema of the new table is the same as the schema of an 
existing table. Therefore, if there is a table with the same name but a 
different schema, this command will also return successfully, but it does not 
mean that a new table and a new schema have been created.
+
+### View partition
+
+You can use the `show create table` command to view the partition information 
of a table.
+
+```
+> show create table  example_range_tbl 
++-------------------+---------------------------------------------------------------------------------------------------------+
                                                                                
                            
+| Table             | Create Table                                             
                                               |                                
                                                                            
++-------------------+---------------------------------------------------------------------------------------------------------+
                                                                                
                            
+| example_range_tbl | CREATE TABLE `example_range_tbl` (                       
                                               |                                
                                                                            
+|                   |   `user_id` largeint(40) NOT NULL COMMENT 'User ID',     
                                                |                               
                                                                             
+|                   |   `date` date NOT NULL COMMENT 'Date when the data are 
imported',                                                      |               
                                                                                
             
+|                   |   `timestamp` datetime NOT NULL COMMENT 'Timestamp when 
the data are imported',                                             |           
                                                                                
                 
+|                   |   `city` varchar(20) NULL COMMENT 'User location city',  
                                                     |                          
                                                                                
  
+|                   |   `age` smallint(6) NULL COMMENT 'User age',             
                                               |                                
                                                                            
+|                   |   `sex` tinyint(4) NULL COMMENT 'User gender',           
                                                  |                             
                                                                               
+|                   |   `last_visit_date` datetime REPLACE NULL DEFAULT 
"1970-01-01 00:00:00" COMMENT 'User last visit time', |                         
                                                                                
   
+|                   |   `cost` bigint(20) SUM NULL DEFAULT "0" COMMENT 'Total 
user consumption',                                          |                   
                                                                                
         
+|                   |   `max_dwell_time` int(11) MAX NULL DEFAULT "0" COMMENT 
'Maximum user dwell time',                             |                        
                                                                                
    
+|                   |   `min_dwell_time` int(11) MIN NULL DEFAULT "99999" 
COMMENT 'Minimum user dwell time'                          |                    
                                                                                
        
+|                   | ) ENGINE=OLAP                                            
                                               |                                
                                                                            
+|                   | AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, 
`age`, `sex`)                                     |                             
                                                                               
+|                   | COMMENT 'OLAP'                                           
                                               |                                
                                                                            
+|                   | PARTITION BY RANGE(`date`)                               
                                               |                                
                                                                            
+|                   | (PARTITION p201701 VALUES [('0000-01-01'), 
('2017-02-01')),                                             |                  
                                                                                
          
+|                   | PARTITION p201702 VALUES [('2017-02-01'), 
('2017-03-01')),                                              |                 
                                                                                
           
+|                   | PARTITION p201703 VALUES [('2017-03-01'), 
('2017-04-01')))                                              |                 
                                                                                
           
+|                   | DISTRIBUTED BY HASH(`user_id`) BUCKETS 16                
                                               |                                
                                                                            
+|                   | PROPERTIES (                                             
                                               |                                
                                                                            
+|                   | "replication_allocation" = "tag.location.default: 1",    
                                               |                                
                                                                            
+|                   | "is_being_synced" = "false",                             
                                               |                                
                                                                            
+|                   | "storage_format" = "V2",                                 
                                               |                                
                                                                            
+|                   | "light_schema_change" = "true",                          
                                               |                                
                                                                            
+|                   | "disable_auto_compaction" = "false",                     
                                               |                                
                                                                            
+|                   | "enable_single_replica_compaction" = "false"             
                                               |                                
                                                                            
+|                   | );                                                       
                                               |                                
                                                                            
++-------------------+---------------------------------------------------------------------------------------------------------+
   
+```
+
+You can use `show partitions from your_table` command to view the partition 
information of a table.
+
+```
+> show partitions from example_range_tbl
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
                                                                                
                     
+| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime  | State  
| PartitionKey | Range                                                          
                | DistributionKey | Buckets | ReplicationNum | StorageMedium 
+| CooldownTime        | RemoteStoragePolicy | LastConsistencyCheckTime | 
DataSize | IsInMemory | ReplicaAllocation       | IsMutable |                   
                                                                                
  
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
                                                                                
                     
+| 28731       | p201701       | 1              | 2024-01-25 10:50:51 | NORMAL 
| date         | [types: [DATEV2]; keys: [0000-01-01]; ..types: [DATEV2]; keys: 
[2017-02-01]; ) | user_id         | 16      | 1              | HDD           
+| 9999-12-31 23:59:59 |                     |                    | 0.000    | 
false      | tag.location.default: 1 | true      |                              
                                                                       
+| 28732       | p201702       | 1              | 2024-01-25 10:50:51 | NORMAL 
| date         | [types: [DATEV2]; keys: [2017-02-01]; ..types: [DATEV2]; keys: 
[2017-03-01]; ) | user_id         | 16      | 1              | HDD           
+| 9999-12-31 23:59:59 |                     |                    | 0.000    | 
false      | tag.location.default: 1 | true      |                              
                                                                       
+| 28733       | p201703       | 1              | 2024-01-25 10:50:51 | NORMAL 
| date         | [types: [DATEV2]; keys: [2017-03-01]; ..types: [DATEV2]; keys: 
[2017-04-01]; ) | user_id         | 16      | 1              | HDD           
+| 9999-12-31 23:59:59 |                     |                    | 0.000    | 
false      | tag.location.default: 1 | true      |                              
                                                                       
++-------------+---------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------
++---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+
                  
+```
+
+### Alter partition
+
+You can add a new partition by using the `alter table add partition` command.
+
+```
+ALTER TABLE example_range_tbl ADD  PARTITION p201704 VALUES LESS 
THAN("2020-05-01") DISTRIBUTED BY HASH(`user_id`) BUCKETS 5;
+```
+
+For more partition modification operations, please refer to the SQL manual on 
[ALTER-TABLE-PARTITION](../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION).
+
+## Manual partitioning
+
+### Partition columns
 
+- Partition columns can be specified as one or multiple columns, and the 
partition columns must be KEY columns. The usage of multi-column partitioning 
will be introduced later in the summary section of multi-column partitioning.
+- When `allowPartitionColumnNullable` is set to true, Range partition supports 
the use of NULL partition columns. List Partition does not support NULL 
partition columns at all times.
+- Regardless of the type of partition column, double quotes are required when 
writing partition values.
+- There is theoretically no upper limit on the number of partitions.
+- When creating a table without partitioning, the system will automatically 
generate a full-range partition with the same name as the table name. This 
partition is not visible to users and cannot be deleted or modified.
+- Overlapping ranges are not allowed when creating partitions.
 
--- List Partition
+### Range partition
 
-CREATE TABLE IF NOT EXISTS example_db.example_list_tbl
+Partition columns are usually time columns for convenient management of old 
and new data. Range partition supports column types such as DATE, DATETIME, 
TINYINT, SMALLINT, INT, BIGINT, and LARGEINT.
+
+Partition information supports four writing methods:
+
+- FIXED RANGE: the partition as a left-closed, right-open interval.
+
+```
+PARTITION BY RANGE(col1[, col2, ...])                                          
                                                                                
                                                                        
+(                                                                              
                                                                                
                                                                        
+    PARTITION partition_name1 VALUES [("k1-lower1", "k2-lower1", 
"k3-lower1",...), ("k1-upper1", "k2-upper1", "k3-upper1", ...)),                
                                                                                
      
+    PARTITION partition_name2 VALUES [("k1-lower1-2", "k2-lower1-2", ...), 
("k1-upper1-2", MAXVALUE, ))                                                    
                                                                            
+)                                                                              
                                                                                
                                                                        
+```
+
+For example: 
+
+```
+PARTITION BY RANGE(`date`)
 (
-    `user_id` LARGEINT NOT NULL COMMENT "User ID",
-    `date` DATE NOT NULL COMMENT "Date when the data are imported",
-    `timestamp` DATETIME NOT NULL COMMENT "Timestamp when the data are 
imported",
-    `city` VARCHAR(20) COMMENT "User location city",
-    `age` SMALLINT COMMENT "User Age",
-    `sex` TINYINT COMMENT "User gender",
-    `last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT 
"User last visit time",
-    `cost` BIGINT SUM DEFAULT "0" COMMENT "Total user consumption",
-    `max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum user dwell time",
-    `min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum user dwell time"
+    PARTITION `p201701` VALUES [("2017-01-01"),  ("2017-02-01")),
+    PARTITION `p201702` VALUES [("2017-02-01"), ("2017-03-01")),
+    PARTITION `p201703` VALUES [("2017-03-01"), ("2017-04-01"))
 )
-ENGINE=olap
-AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
-PARTITION BY LIST(`city`)
+```
+
+- LESS THAN: Only define the upper bound of the partition. The lower bound is 
determined by the upper bound of the previous partition.
+
+```
+PARTITION BY RANGE(col1[, col2, ...])                                          
                                                                                
                                                                        
+(                                                                              
                                                                                
                                                                        
+    PARTITION partition_name1 VALUES LESS THAN MAXVALUE | ("value1", "value2", 
...),                                                                           
                                                                          
+    PARTITION partition_name2 VALUES LESS THAN MAXVALUE | ("value1", "value2", 
...)                                                                            
                                                                          
+)                                                                              
                                                                                
                                                                        
+```
+
+For example:
+
+```
+PARTITION BY RANGE(`date`)
+(
+    PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
+    PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
+    PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
+)
+
+PARTITION BY RANGE(`date`)
+(
+    PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
+    PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
+    PARTITION `p201703` VALUES LESS THAN ("2017-04-01")
+    PARTITION `other` VALUES LESS THAN (MAXVALUE)
+)
+```
+
+- BATCH RANGE: Batch create RANGE partitions of numeric and time types, 
defining the partitions as left-closed, right-open intervals, and setting the 
step size.
+
+```
+PARTITION BY RANGE(int_col)                                                    
                                                                                
                                                                        
+(                                                                              
                                                                                
                                                                        
+    FROM (start_num) TO (end_num) INTERVAL interval_value                      
                                                                                
                                                                                
             
+)
+
+PARTITION BY RANGE(date_col)                                                   
                                                                                
                                                                         
+(                                                                              
                                                                                
                                                                        
+    FROM ("start_date") TO ("end_date") INTERVAL num YEAR | num MONTH | num 
WEEK | num DAY ｜ 1 HOUR                                                         
                                                                                
                                                          
+)                                                                              
                                                                                
                                                                      
+```
+
+For example: 
+
+```
+PARTITION BY RANGE(age)
+(
+    FROM (1) TO (100) INTERVAL 10
+)
+
+PARTITION BY RANGE(`date`)
+(
+    FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 2 YEAR
+)
+```
+
+- MULTI RANGE: Batch create RANGE partitions, defining the partitions as 
left-closed, right-open intervals. For example:
+
+```
+PARTITION BY RANGE(col)                                                        
                                                                                
                                                                        
+(                                                                              
                                                                                
                                                                        
+   FROM ("2000-11-14") TO ("2021-11-14") INTERVAL 1 YEAR,                      
                                                                                
                                                                        
+   FROM ("2021-11-14") TO ("2022-11-14") INTERVAL 1 MONTH,                     
                                                                                
                                                                        
+   FROM ("2022-11-14") TO ("2023-01-03") INTERVAL 1 WEEK,                      
                                                                                
                                                                        
+   FROM ("2023-01-03") TO ("2023-01-14") INTERVAL 1 DAY,
+   PARTITION p_20230114 VALUES [('2023-01-14'), ('2023-01-15'))                
                                                                                
                                                                                
+)                                                                              
                                                                                
                                                                        
+```
+
+### List partition
+
+Partition columns support data types such as BOOLEAN, TINYINT, SMALLINT, INT, 
BIGINT, LARGEINT, DATE, DATETIME, CHAR, and VARCHAR. Partition values are 
enumerated values. Only when the data is one of the enumerated values of the 
target partition, the partition can be hit .
+
+Partitions support specifying the enumerated values contained in each 
partition through VALUES IN (...).
+
+For example:
+
+```
+PARTITION BY LIST(city)
 (
     PARTITION `p_cn` VALUES IN ("Beijing", "Shanghai", "Hong Kong"),
     PARTITION `p_usa` VALUES IN ("New York", "San Francisco"),
     PARTITION `p_jp` VALUES IN ("Tokyo")
 )
-DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
+```
+
+List partition also supports multi-column partitioning, for example:
+
+```
+PARTITION BY LIST(id, city)
+(
+    PARTITION p1_city VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
+    PARTITION p2_city VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
+    PARTITION p3_city VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
+)
+```
+
+## Dynamic partition
+
+Dynamic partition is designed to manage partition's Time-to-Life (TTL), 
reducing the burden on users.
+
+In some usage scenarios, the user will partition the table according to the 
day and perform routine tasks regularly every day. At this time, the user needs 
to manually manage the partition. Otherwise, the data load may fail because the 
user does not create a partition. This brings additional maintenance costs to 
the user.
+
+With dynamic partitioning, users can define rules for partition creation and 
deletion when establishing tables. The FE initiates a background thread to 
handle partition creation or deletion based on these user-defined rules. Users 
also have the flexibility to modify these rules during runtime.
+
+It's important to note that dynamic partitioning is exclusively supported by 
range partitions. Currently, the functionality enables dynamic addition and 
deletion of partitions.
+
+:::tip
+
+This feature will be disabled when synchronized by CCR. If this table is 
copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will 
be displayed as enabled in show create table, but will not actually take 
effect. When `is_being_synced` is set to `false`, these features will resume 
working, but the `is_being_synced` property is for CCR peripheral modules only 
and should not be manually set during CCR synchronization.
+
+:::
+
+### How to use
+
+The rules for dynamic partitioning can be specified when the table is created 
or modified at runtime.
+
+Currently, dynamic partition rules can only be set for partition tables with 
single partition columns.    
+
+- Specified when creating table
+
+```
+CREATE TABLE tbl1
+(...)
 PROPERTIES
 (
-    "replication_num" = "3",
-    "storage_medium" = "SSD",
-    "storage_cooldown_time" = "2018-01-01 12:00:00"
-);
+    "dynamic_partition.prop1" = "value1",
+    "dynamic_partition.prop2" = "value2",
+    ...
+)
+```
+
+- Modify at runtime
 
 ```
+ALTER TABLE tbl1 SET
+(
+    "dynamic_partition.prop1" = "value1",
+    "dynamic_partition.prop2" = "value2",
+    ...
+)
+```
+
+### Rule parameters
+
+The rules of dynamic partition are prefixed with `dynamic_partition.`:
+
+- `dynamic_partition.enable`
+
+  Whether to enable the dynamic partition feature. Can be specified as `TRUE` 
or` FALSE`. If not filled, the default is `TRUE`. If it is `FALSE`, Doris will 
ignore the dynamic partitioning rules of the table.
+
+- `dynamic_partition.time_unit`(required parameters)
+
+  The unit for dynamic partition scheduling. Can be specified as 
`HOUR`,`DAY`,` WEEK`, `MONTH` and `YEAR`, means to create or delete partitions 
by hour, day, week, month and year, respectively.
+
+  When specified as `HOUR`, the suffix format of the dynamically created 
partition name is `yyyyMMddHH`, for example, `2020032501`. *When the time unit 
is HOUR, the data type of partition column cannot be DATE.*
+
+  When specified as `DAY`, the suffix format of the dynamically created 
partition name is `yyyyMMdd`, for example, `20200325`.
+
+  When specified as `WEEK`, the suffix format of the dynamically created 
partition name is `yyyy_ww`. That is, the week of the year of current date. For 
example, the suffix of the partition created for `2020-03-25` is `2020_13`, 
indicating that it is currently the 13th week of 2020.
+
+  When specified as `MONTH`, the suffix format of the dynamically created 
partition name is `yyyyMM`, for example, `202003`.
+
+  When specified as `YEAR`, the suffix format of the dynamically created 
partition name is `yyyy`, for example, `2020`.
 
-### Definition of Column
+- `dynamic_partition.time_zone`
 
-Here we only use the AGGREGATE KEY data model as an example. See [Doris Data 
Model](./data-model.md) for more information.
+  The time zone of the dynamic partition, if not filled in, defaults to the 
time zone of the current machine's system, such as `Asia/Shanghai`, if you want 
to know the supported TimeZone, you can found in 
[Timezone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones).
 
-You can view the basic types of columns by executing `HELP CREATE TABLE;` in 
MySQL Client.
+- `dynamic_partition.start`
 
-In the AGGREGATE KEY data model, all columns that are specified with an 
aggregation type (SUM, REPLACE, MAX, or MIN) are Value columns. The rest are 
the Key columns.
+  The starting offset of the dynamic partition, usually a negative number. 
Depending on the `time_unit` attribute, based on the current day (week / 
month), the partitions with a partition range before this offset will be 
deleted. If not filled, the default is `-2147483648`, that is, the history 
partition will not be  deleted.
 
-A few suggested rules for defining columns include:
+- `dynamic_partition.end`(required parameters)
 
-1. The Key columns must precede all Value columns.
-2. Try to choose the INT type as much as possible. Because calculations and 
lookups on INT types are much more efficient than those on strings.
-3. For the lengths of the INT types, follow the **good enough** principle.
-4. For the lengths of the VARCHAR and STRING types, also follow the **good 
enough** principle.
+  The end offset of the dynamic partition, usually a positive number. 
According to the difference of the `time_unit` attribute, the partition of the 
corresponding range is created in advance based on the current day (week / 
month).
 
-### Partitioning and Bucketing
 
-Doris supports two layers of data partitioning. The first level is Partition, 
including range partitioning and list partitioning. The second is Bucket 
(Tablet), including hash and random partitioning.
+- `dynamic_partition.prefix`(required parameters)
 
-It is also possible to use one layer of data partitioning, If you do not write 
the partition statement when creating the table, Doris will generate a default 
partition at this time, which is transparent to the user. In this case, it only 
supports data bucketing.
+  The dynamically created partition name prefix.
 
-1. Partition
 
-   * You can specify one or more columns as the partitioning columns, but they 
have to be KEY columns. The usage of multi-column partitions is described 
further below. 
-   * Regardless of the type of the partitioning columns, double quotes are 
required for partition values.
-   * There is no theoretical limit on the number of partitions.
-   * If users create a table without specifying the partitions, the system 
will automatically generate a Partition with the same name as the table. This 
Partition contains all data in the table and is neither visible to users nor 
modifiable.
-   * **Partitions should not have overlapping ranges**.
+- `dynamic_partition.buckets`
 
-   #### Range Partitioning
+    The number of buckets corresponding to the dynamically created partitions.
 
-   * Partitioning columns are usually time columns for easy management of old 
and new data.
+- `dynamic_partition.replication_num`
 
-   * Range partitioning support column type: 
[DATE,DATETIME,TINYINT,SMALLINT,INT,BIGINT,LARGEINT]
+  The replication number of dynamic partition.If not filled in, defaults to 
the number of table's replication number.    
 
-   * Range partitioning supports specifying only the upper bound by `VALUES 
LESS THAN (...)`. The system will use the upper bound of the previous partition 
as the lower bound of the next partition, and generate a left-closed right-open 
interval. It also supports specifying both the upper and lower bounds by 
`VALUES [...)`, and generate a left-closed right-open interval.
+- `dynamic_partition.start_day_of_week`
 
-   * The following takes the `VALUES [...)` method as an example since it is 
more comprehensible. It shows how the partition ranges change as we use the  
`VALUES LESS THAN (...)` statement to add or delete partitions:
+  When `time_unit` is` WEEK`, this parameter is used to specify the starting 
point of the week. The value ranges from 1 to 7. Where 1 is Monday and 7 is 
Sunday. The default is 1, which means that every week starts on Monday.    
 
-     * As in the `example_range_tbl` example above, when the table is created, 
the following 3 partitions are automatically generated:
 
-       ```
-       P201701: [MIN_VALUE, 2017-02-01)
-       P201702: [2017-02-01, 2017-03-01)
-       P201703: [2017-03-01, 2017-04-01)
-       ```
+- `dynamic_partition.start_day_of_month`
 
-     * If we add Partition p201705 VALUES LESS THAN ("2017-06-01"), the 
results will be as follows:
+  When `time_unit` is` MONTH`, this parameter is used to specify the start 
date of each month. The value ranges from 1 to 28. 1 means the 1st of every 
month, and 28 means the 28th of every month. The default is 1, which means that 
every month starts at 1st. The 29, 30 and 31 are not supported at the moment to 
avoid ambiguity caused by lunar years or months.
 
-       ```
-       P201701: [MIN_VALUE, 2017-02-01)
-       P201702: [2017-02-01, 2017-03-01)
-       P201703: [2017-03-01, 2017-04-01)
-       P201705: [2017-04-01, 2017-06-01)
-       ```
 
-     * Then we delete Partition p201703, the results will be as follows:
+- `dynamic_partition.create_history_partition`
 
-       ```
-       p201701: [MIN_VALUE, 2017-02-01)
-       p201702: [2017-02-01, 2017-03-01)
-       p201705: [2017-04-01, 2017-06-01)
-       ```
+  The default is false. When set to true, Doris will automatically create all 
partitions, as described in the creation rules below. At the same time, the 
parameter `max_dynamic_partition_num` of FE will limit the total number of 
partitions to avoid creating too many partitions at once. When the number of 
partitions expected to be created is greater than `max_dynamic_partition_num`, 
the operation will fail.
 
-       > Note that the partition range of p201702 and p201705 has not changed, 
and there is a gap between the two partitions: [2017-03-01, 2017-04-01). That 
means, if the imported data is within this gap range, the import would fail.
+  When the `start` attribute is not specified, this parameter has no effect.
 
-     * Now we go on and delete Partition p201702, the results will be as 
follows:
+- `dynamic_partition.history_partition_num`
 
-       ```
-       p201701: [MIN_VALUE, 2017-02-01)
-       p201705: [2017-04-01, 2017-06-01)
-       ```
+  When `create_history_partition` is `true`, this parameter is used to specify 
the number of history partitions. The default value is -1, which means it is 
not set.
 
-       > The gap range expands to: [2017-02-01, 2017-04-01)
+- `dynamic_partition.hot_partition_num`
 
-     * Then we add Partition p201702new VALUES LESS THAN ("2017-03-01"), the 
results will be as follows:
+  Specify how many of the latest partitions are hot partitions. For hot 
partition, the system will automatically set its `storage_medium` parameter to 
SSD, and set `storage_cooldown_time`.
 
-       ```
-       p201701: [MIN_VALUE, 2017-02-01)
-       p201702new: [2017-02-01, 2017-03-01)
-       p201705: [2017-04-01, 2017-06-01)
-       ```
+  :::tip
 
-       > The gap range shrinks to: [2017-03-01, 2017-04-01)
+  If there is no SSD disk path under the storage path, configuring this 
parameter will cause dynamic partition creation to fail.
 
-     * Now we delete Partition p201701 and add Partition p201612 VALUES LESS 
THAN ("2017-01-01"), the partition result is as follows:
+  :::
 
-       ```
-       p201612: [MIN_VALUE, 2017-01-01)
-       p201702new: [2017-02-01, 2017-03-01)
-       p201705: [2017-04-01, 2017-06-01)
-       ```
+  `hot_partition_num` is all partitions in the previous n days and in the 
future.
 
-       > This results in a new gap range: [2017-01-01, 2017-02-01)
+  Let us give an example. Suppose today is 2021-05-20, partition by day, and 
the properties of dynamic partition are set to: hot_partition_num=2, end=3, 
start=-3. Then the system will automatically create the following partitions, 
and set the `storage_medium` and `storage_cooldown_time` properties:
+
+  ```
+  p20210517: ["2021-05-17", "2021-05-18") storage_medium=HDD 
storage_cooldown_time=9999-12-31 23:59:59
+  p20210518: ["2021-05-18", "2021-05-19") storage_medium=HDD 
storage_cooldown_time=9999-12-31 23:59:59
+  p20210519: ["2021-05-19", "2021-05-20") storage_medium=SSD 
storage_cooldown_time=2021-05-21 00:00:00
+  p20210520: ["2021-05-20", "2021-05-21") storage_medium=SSD 
storage_cooldown_time=2021-05-22 00:00:00
+  p20210521: ["2021-05-21", "2021-05-22") storage_medium=SSD 
storage_cooldown_time=2021-05-23 00:00:00
+  p20210522: ["2021-05-22", "2021-05-23") storage_medium=SSD 
storage_cooldown_time=2021-05-24 00:00:00
+  p20210523: ["2021-05-23", "2021-05-24") storage_medium=SSD 
storage_cooldown_time=2021-05-25 00:00:00
+  ```
+
+
+- `dynamic_partition.reserved_history_periods`
+
+  The range of reserved history periods. It should be in the form of 
`[yyyy-MM-dd,yyyy-MM-dd],[...,...]` while the `dynamic_partition.time_unit` is 
"DAY, WEEK, MONTH and YEAR". And it should be in the form of `[yyyy-MM-dd 
HH:mm:ss,yyyy-MM-dd HH:mm:ss],[...,...]` while the dynamic_partition.time_unit` 
is "HOUR". And no more spaces expected. The default value is `"NULL"`, which 
means it is not set.
+
+  Let us give an example. Suppose today is 2021-09-06，partitioned by day, and 
the properties of dynamic partition are set to: 
+
+  ```time_unit="DAY/WEEK/MONTH/YEAR", end=3, start=-3, 
reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]"```.
+
+  The system will automatically reserve following partitions in following 
period :
+
+  ```
+  ["2020-06-01","2020-06-20"],
+  ["2020-10-31","2020-11-15"]
+  ```
+
+  or
+
+  ```time_unit="HOUR", end=3, start=-3, reserved_history_periods="[2020-06-01 
00:00:00,2020-06-01 03:00:00]"```.
+
+  The system will automatically reserve following partitions in following 
period :
+
+  ```
+  ["2020-06-01 00:00:00","2020-06-01 03:00:00"]
+  ```
 
-   In summary, the deletion of a partition does not change the range of the 
existing partitions, but might result in gaps. When a partition is added via 
the `VALUES LESS THAN` statement, the lower bound of one partition is the upper 
bound of its previous partition.
+  Otherwise, every `[...,...]` in `reserved_history_periods` is a couple of 
properties, and they should be set at the same time. And the first date can't 
be larger than the second one.
 
-   In addition to the single-column partitioning mentioned above, Range 
Partitioning also supports **multi-column partitioning**. Examples are as 
follows:
 
-   ```text
-    PARTITION BY RANGE(`date`, `id`)
-    (
-        PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
-        PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
-        PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
-    )
+- `dynamic_partition.storage_medium`
+
+  <version since="1.2.3"></version>
+
+  Specifies the default storage medium for the created dynamic partition. HDD 
is the default, SSD can be selected.
+
+  Note that when set to SSD, the `hot_partition_num` property will no longer 
take effect, all partitions will default to SSD storage media and the cooldown 
time will be 9999-12-31 23:59:59.
+
+#### Create history partition rules
+
+When `create_history_partition` is `true`, i.e. history partition creation is 
enabled, Doris determines the number of history partitions to be created based 
on `dynamic_partition.start` and `dynamic_partition.history_partition_num`. 
+
+Assuming the number of history partitions to be created is 
`expect_create_partition_num`, the number is as follows according to different 
settings.
+
+- `create_history_partition` = `true`  
+  - `dynamic_partition.history_partition_num` is not set, i.e. -1.  
+    `expect_create_partition_num` = `end` - `start`; 
+  - `dynamic_partition.history_partition_num` is set   
+    `expect_create_partition_num` = `end` - max(`start`, 
`-history_partition_num`);
+- `create_history_partition` = `false`  
+
+No history partition will be created, `expect_create_partition_num` = `end` - 
0;
+
+When `expect_create_partition_num` is greater than `max_dynamic_partition_num` 
(default 500), creating too many partitions is prohibited.
+
+**Examples:** 
+
+Suppose today is 2021-05-20, partition by day, and the attributes of dynamic 
partition are set to `create_history_partition=true, end=3, start=-3, 
history_partition_num=1`, then the system will automatically create the 
following partitions.
+
+```
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+`history_partition_num=5` and keep the rest attributes as in 1, then the 
system will automatically create the following partitions.
+
+```
+p20210517
+p20210518
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+`history_partition_num=-1` i.e., if you do not set the number of history 
partitions and keep the rest of the attributes as in 1, the system will 
automatically create the following partitions.
+
+```
+p20210517
+p20210518
+p20210519
+p20210520
+p20210521
+p20210522
+p20210523
+```
+
+### Example
+
+1. Table `tbl1` partition column k1, type is DATE, create a dynamic partition 
rule. By day partition, only the partitions of the last 7 days are kept, and 
the partitions of the next 3 days are created in advance.
+
+   ```
+   CREATE TABLE tbl1
+   (
+       k1 DATE,
+       ...
+   )
+   PARTITION BY RANGE(k1) ()
+   DISTRIBUTED BY HASH(k1)
+   PROPERTIES
+   (
+       "dynamic_partition.enable" = "true",
+       "dynamic_partition.time_unit" = "DAY",
+       "dynamic_partition.start" = "-7",
+       "dynamic_partition.end" = "3",
+       "dynamic_partition.prefix" = "p",
+       "dynamic_partition.buckets" = "32"
+   );
    ```
 
-    In the above example, we specify `date` (DATE type) and `id` (INT type) as 
the partitioning columns, so the resulting partitions will be as follows:
+   Suppose the current date is 2020-05-29. According to the above rules, tbl1 
will produce the following partitions:
 
-    ``` text
-        *p201701_1000: [(MIN_VALUE, MIN_VALUE), ("2017-02-01", "1000") )
-        *p201702_2000: [("2017-02-01", "1000"), ("2017-03-01", "2000") )
-        *p201703_all: [("2017-03-01", "2000"), ("2017-04-01", MIN_VALUE))
-    ```
+   ```
+   p20200529: ["2020-05-29", "2020-05-30")
+   p20200530: ["2020-05-30", "2020-05-31")
+   p20200531: ["2020-05-31", "2020-06-01")
+   p20200601: ["2020-06-01", "2020-06-02")
+   ```
 
-   Note that in the last partition, the user only specifies the partition 
value of the `date` column, so the system fills in `MIN_VALUE` as the partition 
value of the `id` column by default. When data are imported, the system will 
compare them with the partition values in order, and put the data in their 
corresponding partitions. Examples are as follows:
+   On the next day, 2020-05-30, a new partition will be created `p20200602: [" 
2020-06-02 "," 2020-06-03 ")`
 
-    ``` text
-       * Data --> Partition
-       * 2017-01-01, 200   --> p201701_1000
-       * 2017-01-01, 2000  --> p201701_1000
-       * 2017-02-01, 100   --> p201701_1000
-       * 2017-02-01, 2000  --> p201702_2000
-       * 2017-02-15, 5000  --> p201702_2000
-       * 2017-03-01, 2000  --> p201703_all
-       * 2017-03-10, 1     --> p201703_all
-       * 2017-04-01, 1000  --> Unable to import
-       * 2017-05-01, 1000  --> Unable to import
-    ```
+   On 2020-06-06, because `dynamic_partition.start` is set to 7, the partition 
7 days ago will be deleted, that is, the partition `p20200529` will be deleted.
 
+2. Table tbl1 partition column k1, type is DATETIME, create a dynamic 
partition rule. Partition by week, only keep the partition of the last 2 weeks, 
and create the partition of the next 2 weeks in advance.
 
-<version since="1.2.0">
-    
+   ```
+   CREATE TABLE tbl1
+   (
+       k1 DATETIME,
+       ...
+   )
+   PARTITION BY RANGE(k1) ()
+   DISTRIBUTED BY HASH(k1)
+   PROPERTIES
+   (
+       "dynamic_partition.enable" = "true",
+       "dynamic_partition.time_unit" = "WEEK",
+       "dynamic_partition.start" = "-2",
+       "dynamic_partition.end" = "2",
+       "dynamic_partition.prefix" = "p",
+       "dynamic_partition.buckets" = "8"
+   );
+   ```
 
-Range partitioning also supports batch partitioning. For example, you can 
create multiple partitions that are divided by day at a time using the `FROM 
("2022-01-03") TO ("2022-01-06") INTERVAL 1 DAY`: 2022-01-03 to 2022-01-06 (not 
including 2022-01-06), the results will be as follows:
+   Suppose the current date is 2020-05-29, which is the 22nd week of 2020. The 
default week starts on Monday. Based on the above rules, tbl1 will produce the 
following partitions:
 
-    p20220103:    [2022-01-03,  2022-01-04)
-    p20220104:    [2022-01-04,  2022-01-05)
-    p20220105:    [2022-01-05,  2022-01-06)
+   ```
+   p2020_22: ["2020-05-25 00:00:00", "2020-06-01 00:00:00")
+   p2020_23: ["2020-06-01 00:00:00", "2020-06-08 00:00:00")
+   p2020_24: ["2020-06-08 00:00:00", "2020-06-15 00:00:00")
+   ```
 
-</version>
-    
+   The start date of each partition is Monday of the week. At the same time, 
because the type of the partition column k1 is DATETIME, the partition value 
will fill the hour, minute and second fields, and all are 0.
 
-#### List Partitioning
+   On 2020-06-15, the 25th week, the partition 2 weeks ago will be deleted, ie 
`p2020_22` will be deleted.
 
-* The partitioning columns support the `BOOLEAN, TINYINT, SMALLINT, INT, 
BIGINT, LARGEINT, DATE, DATETIME, CHAR, VARCHAR` data types, and the partition 
values are enumeration values. Partitions can be only hit if the data is one of 
the enumeration values in the target partition.
+   In the above example, suppose the user specified the start day of the week 
as `"dynamic_partition.start_day_of_week" = "3"`, that is, set Wednesday as the 
start of week. The partition is as follows:
 
-* List partitioning supports using `VALUES IN (...) ` to specify the 
enumeration values contained in each partition.
+   ```
+   p2020_22: ["2020-05-27 00:00:00", "2020-06-03 00:00:00")
+   p2020_23: ["2020-06-03 00:00:00", "2020-06-10 00:00:00")
+   p2020_24: ["2020-06-10 00:00:00", "2020-06-17 00:00:00")
+   ```
 
-* The following example illustrates how partitions change when adding or 
deleting a partition.
+   That is, the partition ranges from Wednesday of the current week to Tuesday 
of the next week.
 
-  * As in the `example_list_tbl` example above, when the table is created, the 
following three partitions are automatically created.
+   :::tip
 
-    ```text
-    p_cn: ("Beijing", "Shanghai", "Hong Kong")
-    p_usa: ("New York", "San Francisco")
-    p_jp: ("Tokyo")
-    ```
+   2019-12-31 and 2020-01-01 are in same week, if the starting date of the 
partition is 2019-12-31, the partition name is `p2019_53`, if the starting date 
of the partition is 2020-01 -01, the partition name is `p2020_01`.
 
+   :::
 
-  * If we add Partition p_uk VALUES IN ("London"), the results will be as 
follows:
+3. Table tbl1 partition column k1, type is DATE, create a dynamic partition 
rule. Partition by month without deleting historical partitions, and create 
partitions for the next 2 months in advance. At the same time, set the starting 
date on the 3rd of each month.
 
-    ```text
-    p_cn: ("Beijing", "Shanghai", "Hong Kong")
-    p_usa: ("New York", "San Francisco")
-    p_jp: ("Tokyo")
-    p_uk: ("London")
-    ```
+   ```
+   CREATE TABLE tbl1
+   (
+       k1 DATE,
+       ...
+   )
+   PARTITION BY RANGE(k1) ()
+   DISTRIBUTED BY HASH(k1)
+   PROPERTIES
+   (
+       "dynamic_partition.enable" = "true",
+       "dynamic_partition.time_unit" = "MONTH",
+       "dynamic_partition.end" = "2",
+       "dynamic_partition.prefix" = "p",
+       "dynamic_partition.buckets" = "8",
+       "dynamic_partition.start_day_of_month" = "3"
+   );
+   ```
 
-  * Now we delete Partition p_jp, the results will be as follows:
+   Suppose the current date is 2020-05-29. Based on the above rules, tbl1 will 
produce the following partitions:
 
-    ```text
-    p_cn: ("Beijing", "Shanghai", "Hong Kong")
-    p_usa: ("New York", "San Francisco")
-    p_uk: ("London")
-    ```
+   ```
+   p202005: ["2020-05-03", "2020-06-03")
+   p202006: ["2020-06-03", "2020-07-03")
+   p202007: ["2020-07-03", "2020-08-03")
+   ```
 
-  List partitioning also supports **multi-column partitioning**. Examples are 
as follows:
+   Because `dynamic_partition.start` is not set, the historical partition will 
not be deleted.
 
-  ```text
-  PARTITION BY LIST(`id`, `city`)
-  (
-      PARTITION `p1_city` VALUES IN (("1", "Beijing"), ("1", "Shanghai")),
-      PARTITION `p2_city` VALUES IN (("2", "Beijing"), ("2", "Shanghai")),
-      PARTITION `p3_city` VALUES IN (("3", "Beijing"), ("3", "Shanghai"))
-  )
+   Assuming that today is 2020-05-20, and set 28th as the start of each month, 
the partition range is:
+
+   ```
+   p202004: ["2020-04-28", "2020-05-28")
+   p202005: ["2020-05-28", "2020-06-28")
+   p202006: ["2020-06-28", "2020-07-28")
+   ```
+
+### Modify dynamic partition properties
+
+You can modify the properties of the dynamic partition with the following 
command
+
+```
+ALTER TABLE tbl1 SET
+(
+    "dynamic_partition.prop1" = "value1",
+    ...
+);
+```
+
+The modification of certain attributes may cause conflicts. Assume that the 
partition granularity was DAY and the following partitions have been created:
+
+```
+p20200519: ["2020-05-19", "2020-05-20")
+p20200520: ["2020-05-20", "2020-05-21")
+p20200521: ["2020-05-21", "2020-05-22")
+```
+
+If the partition granularity is changed to MONTH at this time, the system will 
try to create a partition with the range `["2020-05-01", "2020-06-01")`, and 
this range conflicts with the existing partition. So it cannot be created. And 
the partition with the range `["2020-06-01", "2020-07-01")` can be created 
normally. Therefore, the partition between 2020-05-22 and 2020-05-30 needs to 
be filled manually.
+
+### Check dynamic partition table scheduling status
+
+You can further view the scheduling of dynamic partitioned tables by using the 
following command:
+
+```
+mysql> SHOW DYNAMIC PARTITION TABLES;
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+| TableName | Enable | TimeUnit | Start       | End  | Prefix | Buckets | 
StartOf   | LastUpdateTime | LastSchedulerTime   | State  | 
LastCreatePartitionMsg | LastDropPartitionMsg | ReservedHistoryPeriods  |
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+| d3        | true   | WEEK     | -3          | 3    | p      | 1       | 
MONDAY    | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                 
   | N/A                  | [2021-12-01,2021-12-31] |
+| d5        | true   | DAY      | -7          | 3    | p      | 32      | N/A  
     | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | 
N/A                  | NULL                    |
+| d4        | true   | WEEK     | -3          | 3    | p      | 1       | 
WEDNESDAY | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                 
   | N/A                  | NULL                    |
+| d6        | true   | MONTH    | -2147483648 | 2    | p      | 8       | 3rd  
     | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | 
N/A                  | NULL                    |
+| d2        | true   | DAY      | -3          | 3    | p      | 32      | N/A  
     | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | 
N/A                  | NULL                    |
+| d7        | true   | MONTH    | -2147483648 | 5    | p      | 8       | 24th 
     | N/A            | 2020-05-25 14:29:24 | NORMAL | N/A                    | 
N/A                  | NULL                    |
++-----------+--------+----------+-------------+------+--------+---------+-----------+----------------+---------------------+--------+------------------------+----------------------+-------------------------+
+7 rows in set (0.02 sec)
+```
+
+- LastUpdateTime: The last time of modifying dynamic partition properties 
+- LastSchedulerTime: The last time of performing dynamic partition scheduling
+- State: The state of the last execution of dynamic partition scheduling
+- LastCreatePartitionMsg: Error message of the last time to dynamically add 
partition scheduling
+- LastDropPartitionMsg: Error message of the last execution of dynamic 
deletion partition scheduling
+
+### Advanced operation
+
+**FE Configuration Item**
+
+- dynamic\_partition\_enable
+
+  Whether to enable Doris's dynamic partition feature. The default value is 
false, which is off. This parameter only affects the partitioning operation of 
dynamic partition tables, not normal tables. You can modify the parameters in 
`fe.conf` and restart FE to take effect. You can also execute the following 
commands at runtime to take effect:
+
+  ```
+  MySQL protocol:
+
+  `ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true")`
+
+  HTTP protocol:
+
+  `curl --location-trusted -u username:password -XGET 
http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true`
   ```
 
-  In the above example, we specify `id` (INT type) and `city` (VARCHAR type) 
as the partitioning columns, so the resulting partitions will be as follows:
+  To turn off dynamic partitioning globally, set this parameter to false.
+
+- dynamic\_partition\_check\_interval\_seconds
+
+  The execution frequency of dynamic partition threads defaults to 3600 (1 
hour), that is, scheduling is performed every 1 hour. You can modify the 
parameters in `fe.conf` and restart FE to take effect. You can also modify the 
following commands at runtime:
 
-  ```text
-    * p1_city: [("1", "Beijing"), ("1", "Shanghai")]
-    * p2_city: [("2", "Beijing"), ("2", "Shanghai")]
-    * p3_city: [("3", "Beijing"), ("3", "Shanghai")]
   ```
+  MySQL protocol:
+
+  `ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = 
"7200")`
 
-  When data are imported, the system will compare them with the partition 
values in order, and put the data in their corresponding partitions. Examples 
are as follows:
-  ```text
-  Data ---> Partition
-  1, Beijing  ---> p1_city
-  1, Shanghai ---> p1_city
-  2, Shanghai ---> p2_city
-  3, Beijing  ---> p3_city
-  1, Tianjin  ---> Unable to import
-  4, Beijing  ---> Unable to import
+  HTTP protocol:
+
+  `curl --location-trusted -u username:password -XGET 
http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000`
   ```
 
-2. Bucketing
+**Converting dynamic and manual partition tables to each other**
+
+For a table, dynamic and manual partitioning can be freely converted, but they 
cannot exist at the same time, there is and only one state.
 
-   * If you use the Partition method, the `DISTRIBUTED ...` statement will 
describe how data are divided among partitions. If you do not use the Partition 
method, that statement will describe how data of the whole table are divided.
-   * You can specify multiple columns as the bucketing columns. In Aggregate 
and Unique Models, bucketing columns must be Key columns; in the Duplicate 
Model, bucketing columns can be Key columns and Value columns. Bucketing 
columns can either be partitioning columns or not.
-   * The choice of bucketing columns is a trade-off between **query 
throughput** and **query concurrency**:
+**Converting Manual Partitioning to Dynamic Partitioning**
 
-     1. If you choose to specify multiple bucketing columns, the data will be 
more evenly distributed. However, if the query condition does not include the 
equivalent conditions for all bucketing columns, the system will scan all 
buckets, largely increasing the query throughput and decreasing the latency of 
a single query. This method is suitable for high-throughput, low-concurrency 
query scenarios.
-     2. If you choose to specify only one or a few bucketing columns, point 
queries might scan only one bucket. Thus, when multiple point queries are 
preformed concurrently, they might scan various buckets, with no interaction 
between the IO operations (especially when the buckets are stored on various 
disks). This approach is suitable for high-concurrency point query scenarios.
+If a table is not dynamically partitioned when it is created, it can be 
converted to dynamic partitioning at runtime by modifying the dynamic 
partitioning properties with `ALTER TABLE`, an example of which can be seen 
with `HELP ALTER TABLE`.
 
-   * AutoBucket: Calculates the number of partition buckets based on the 
amount of data. For partitioned tables, you can determine a bucket based on the 
amount of data, the number of machines, and the number of disks in the 
historical partition.
-   * There is no theoretical limit on the number of buckets.
+When dynamic partitioning feature is enabled, Doris will no longer allow users 
to manage partitions manually, but will automatically manage partitions based 
on dynamic partition properties.
 
-3. Recommendations on the number and data volume for Partitions and Buckets.
+:::tip
 
-   * The total number of tablets in a table is equal to (Partition num * 
Bucket num).
-   * The recommended number of tablets in a table, regardless of capacity 
expansion, is slightly more than the number of disks in the entire cluster.
-   * The data volume of a single tablet does not have an upper or lower limit 
theoretically, but is recommended to be in the range of 1G - 10G. Overly small 
data volume of a single tablet can impose a stress on data aggregation and 
metadata management; while overly large data volume can cause trouble in data 
migration and completion, and increase the cost of Schema Change or Rollup 
operation failures (These operations are performed on the Tablet level).
-   * For the tablets, if you cannot have the ideal data volume and the ideal 
quantity at the same time, it is recommended to prioritize the ideal data 
volume.
-   * Upon table creation, you specify the same number of Buckets for each 
Partition. However, when dynamically increasing partitions (`ADD PARTITION`), 
you can specify the number of Buckets for the new partitions separately. This 
feature can help you cope with data reduction or expansion. 
-   * Once you have specified the number of Buckets for a Partition, you may 
not change it afterwards. Therefore, when determining the number of Buckets, 
you need to consider the need of cluster expansion in advance. For example, if 
there are only 3 hosts, and each host has only 1 disk, and you have set the 
number of Buckets is only set to 3 or less, then no amount of newly added 
machines can increase concurrency.
-   * For example, suppose that there are 10 BEs and each BE has one disk, if 
the total size of a table is 500MB, you can consider dividing it into 4-8 
tablets; 5GB: 8-16 tablets; 50GB: 32 tablets; 500GB: you may consider dividing 
it into partitions, with each partition about 50GB in size, and 16-32 tablets 
per partition; 5TB: divided into partitions of around 50GB and 16-32 tablets 
per partition.
+If `dynamic_partition.start` is set, historical partitions with a partition 
range before the start offset of the dynamic partition will be deleted.
 
-   > Note: You can check the data volume of the table using the [show 
data](../sql-manual/sql-reference/Show-Statements/SHOW-DATA.md) command. Divide 
the returned result by the number of copies, and you will know the data volume 
of the table.
+:::
 
-4. About the settings and usage scenarios of Random Distribution:
+**Converting Dynamic Partitioning to Manual Partitioning**
 
-   * If the OLAP table does not have columns of REPLACE type, set the data 
bucketing mode of the table to RANDOM. This can avoid severe data skew. (When 
loading data into the partition corresponding to the table, each batch of data 
in a single load task will be written into a randomly selected tablet).
-   * When the bucketing mode of the table is set to RANDOM, since there are no 
specified bucketing columns, it is impossible to query only a few buckets, so 
all buckets in the hit partition will be scanned when querying the table. Thus, 
this setting is only suitable for aggregate query analysis of the table data as 
a whole, but not for highly concurrent point queries.
-   * If the data distribution of the OLAP table is Random Distribution, you 
can set `load to single tablet`  to true when importing data. In this way, when 
importing large amounts of data, in one task, data will be only written in one 
tablet of the corresponding partition. This can improve both the concurrency 
and throughput of data import and reduce write amplification caused by data 
import and compaction, and thus, ensure cluster stability.
+The dynamic partitioning feature can be disabled by executing `ALTER TABLE 
tbl_name SET ("dynamic_partition.enable" = "false") ` and converting it to a 
manual partition table.
 
-#### Compound Partitioning vs Single Partitioning
+When dynamic partitioning feature is disabled, Doris will no longer manage 
partitions automatically, and users will have to create or delete partitions 
manually by using `ALTER TABLE`.
 
-Compound Partitioning
+## Auto partition
 
-- The first layer of data partitioning is called Partition. Users can specify 
a dimension column as the partitioning column (currently only supports columns 
of INT and TIME types), and specify the value range of each partition.
-- The second layer is called Distribution, which means bucketing. Users can 
perform HASH distribution on data by specifying the number of buckets and one 
or more dimension columns as the bucketing columns, or perform random 
distribution on data by setting the mode to Random Distribution.
+:::tip
 
-Compound partitioning is recommended for the following scenarios:
+Doris version 2.1 starts to support automatic partitioning. To use this 
feature, please [download Doris 2.1](https://doris.apache.org/zh-CN/download) 
and refer to the documentation for version 2.1.
 
-- Scenarios with time dimensions or similar dimensions with ordered values, 
which can be used as partitioning columns. The partitioning granularity can be 
evaluated based on data import frequency, data volume, etc.
-- Scenarios with a need to delete historical data: If, for example, you only 
need to keep the data of the last N days), you can use compound partitioning so 
you can delete historical partitions. To remove historical data, you can also 
send a DELETE statement within the specified partition.
-- Scenarios with a need to avoid data skew: you can specify the number of 
buckets individually for each partition. For example, if you choose to 
partition the data by day, and the data volume per day varies greatly, you can 
customize the number of buckets for each partition. For the choice of bucketing 
column, it is recommended to select the column(s) with variety in values.
+:::
 
-Users can also choose for single partitioning, which is about HASH 
distribution.
+The Auto Partitioning feature supports automatic detection of whether the 
corresponding partition exists during the data import process. If it does not 
exist, the partition will be created automatically and imported normally.
 
-### PROPERTIES
+The auto partition function mainly solves the problem that the user expects to 
partition the table based on a certain column, but the data distribution of the 
column is scattered or unpredictable, so it is difficult to accurately create 
the required partitions when building or adjusting the structure of the table, 
or the number of partitions is so large that it is too cumbersome to create 
them manually.
 
-In the `PROPERTIES` section at the last of the CREATE TABLE statement, you can 
set the relevant parameters. Please see [CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
 for a detailed introduction.
+Take the time type partition column as an example, in the Dynamic Partition 
function, we support the automatic creation of new partitions to accommodate 
real-time data at specific time periods. For real-time user behavior logs and 
other scenarios, this feature basically meets the requirements. However, in 
more complex scenarios, such as dealing with non-real-time data, the partition 
column is independent of the current system time and contains a large number of 
discrete values. At this t [...]
 
-### ENGINE
+Suppose our table DDL is as follows:
+
+```sql
+CREATE TABLE `DAILY_TRADE_VALUE`
+(
+    `TRADE_DATE`              datev2 NOT NULL COMMENT '交易日期',
+    `TRADE_ID`                varchar(40) NOT NULL COMMENT '交易编号',
+    ......
+)
+UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`)
+PARTITION BY RANGE(`TRADE_DATE`)
+(
+    PARTITION p_2000 VALUES [('2000-01-01'), ('2001-01-01')),
+    PARTITION p_2001 VALUES [('2001-01-01'), ('2002-01-01')),
+    PARTITION p_2002 VALUES [('2002-01-01'), ('2003-01-01')),
+    PARTITION p_2003 VALUES [('2003-01-01'), ('2004-01-01')),
+    PARTITION p_2004 VALUES [('2004-01-01'), ('2005-01-01')),
+    PARTITION p_2005 VALUES [('2005-01-01'), ('2006-01-01')),
+    PARTITION p_2006 VALUES [('2006-01-01'), ('2007-01-01')),
+    PARTITION p_2007 VALUES [('2007-01-01'), ('2008-01-01')),
+    PARTITION p_2008 VALUES [('2008-01-01'), ('2009-01-01')),
+    PARTITION p_2009 VALUES [('2009-01-01'), ('2010-01-01')),
+    PARTITION p_2010 VALUES [('2010-01-01'), ('2011-01-01')),
+    PARTITION p_2011 VALUES [('2011-01-01'), ('2012-01-01')),
+    PARTITION p_2012 VALUES [('2012-01-01'), ('2013-01-01')),
+    PARTITION p_2013 VALUES [('2013-01-01'), ('2014-01-01')),
+    PARTITION p_2014 VALUES [('2014-01-01'), ('2015-01-01')),
+    PARTITION p_2015 VALUES [('2015-01-01'), ('2016-01-01')),
+    PARTITION p_2016 VALUES [('2016-01-01'), ('2017-01-01')),
+    PARTITION p_2017 VALUES [('2017-01-01'), ('2018-01-01')),
+    PARTITION p_2018 VALUES [('2018-01-01'), ('2019-01-01')),
+    PARTITION p_2019 VALUES [('2019-01-01'), ('2020-01-01')),
+    PARTITION p_2020 VALUES [('2020-01-01'), ('2021-01-01')),
+    PARTITION p_2021 VALUES [('2021-01-01'), ('2022-01-01'))
+)
+DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
+PROPERTIES (
+  "replication_num" = "1"
+);
+```
+
+The table stores a large amount of business history data, partitioned based on 
the date the transaction occurred. As you can see when building the table, we 
need to manually create the partitions in advance. If the data range of the 
partitioned columns changes, for example, 2022 is added to the above table, we 
need to create a partition by 
[ALTER-TABLE-PARTITION](../../sql-manual/sql-reference/Data-Definition-Statements/Alter/ALTER-TABLE-PARTITION)
 to make changes to the table partition. [...]
+
+## Manual bucketing
+
+If partitions are used, `DISTRIBUTED ..`. statement describes the rules for 
dividing data within each partition.
+
+If partitions are not used, it describes the rules for dividing the data 
across the entire table.
+
+It is also possible to specify a bucketing method for each partition 
individually.
+
+The bucket columns can be multiple columns. For the Aggregate and Unique 
models, they must be Key columns, while for the duplicate key data model, they 
can be both key and value columns. Bucket columns can be the same as or 
different from Partition columns.
+
+The choice of bucket columns involves a trade-off between query throughput and 
query concurrency:
+
+- If multiple bucket columns are selected, the data distribution will be more 
uniform. If a query condition does not include equal conditions for all bucket 
columns, the query will trigger simultaneous scanning of all buckets, 
increasing query throughput and reducing the latency of individual queries. 
This approach is suitable for high-throughput, low-concurrency query scenarios.
+- If only one or a few bucket columns are selected, a point query can trigger 
scanning of just one bucket. In this case, when multiple point queries are 
concurrent, there is a higher probability that they will trigger scanning of 
different buckets, reducing the IO impact between queries (especially when 
different buckets are distributed across different disks). Therefore, this 
approach is suitable for high-concurrency point query scenarios.
+
+### Recommendations for bucket number and data volume:
+
+- The total number of tablets for a table is equal to (Partition num * Bucket 
num).
+- Without considering expansion, it is recommended that the number of tablets 
for a table be slightly more than the total number of disks in the cluster.
+- In theory, there is no upper or lower limit for the data volume of a single 
tablet, but it is recommended to be within the range of 1G - 10G. If the data 
volume of a single tablet is too small, the data aggregation effect will not be 
good, and the metadata management pressure will be high. If the data volume is 
too large, it will not be conducive to the migration and replenishment of 
replicas, and it will increase the cost of retrying failed operations such as 
Schema Change or Rollup ( [...]
+- When there is a conflict between the data volume principle and the quantity 
principle of tablets, it is recommended to prioritize the data volume principle.
+- When creating a table, the bucket number for each partition is uniformly 
specified. However, when dynamically adding partitions `ADD PARTITION`, the 
bucket number for the new partition can be specified separately. This feature 
can be conveniently used to handle data reduction or expansion.
+- Once the bucket number for a partition is specified, it cannot be changed. 
Therefore, when determining the bucket number, it is necessary to consider the 
cluster expansion scenario in advance. For example, if there are only 3 hosts 
with 1 disk each, and the bucket number is set to 3 or less, then even if more 
machines are added later, the concurrency cannot be improved.
+
+Here are some examples: Assuming there are 10 BEs, each with one disk. If a 
table has a total size of 500MB, 4-8 tablets can be considered. For 5GB: 8-16 
tablets. For 50GB: 32 tablets. For 500GB: It is recommended to partition the 
table, with each partition size around 50GB and 16-32 tablets per partition. 
For 5TB: It is recommended to partition the table, with each partition size 
around 50GB and 16-32 tablets per partition.
+
+The data volume of a table can be viewed using the [SHOW 
DATA](../sql-manual/sql-reference/Show-Statements/SHOW-DATA) command, and the 
result should be divided by the number of replicas to obtain the actual data 
volume of the table.
+
+### Random distribution
+
+- If an OLAP table does not have fields of the update type, setting the data 
bucketing mode of the table to RANDOM can avoid severe data skew. When data is 
imported into the corresponding partitions of the table, each batch of a single 
import job will randomly select a tablet for writing.
+- When the bucketing mode of a table is set to RANDOM, there is no bucketing 
column, it is not possible to query only a few buckets based on the values of 
the bucketing column. Queries on the table will simultaneously scan all buckets 
that hit the partition. This setting is suitable for aggregate query analysis 
of the entire table data, but not suitable for high-concurrency point queries.
+- If the data distribution of the OLAP table is Random Distribution, then 
during data import, single-tablet import mode can be set (set 
`load_to_single_tablet` to true). Then, during large-volume data import, a task 
will only write to one tablet when writing data to the corresponding partition. 
This can improve the concurrency and throughput of data import, reduce the 
write amplification caused by data import and compaction, and ensure the 
stability of the cluster.
+
+## Auto bucket
+
+Users often encounter various issues due to improper bucket settings. To 
address this, we provide an automated approach for setting the number of 
buckets, which is currently applicable only to OLAP tables.
+
+:::tip
+
+This feature will be disabled when synchronized by CCR. If this table is 
copied by CCR, that is, PROPERTIES contains `is_being_synced = true`, it will 
be displayed as enabled in show create table, but will not actually take 
effect. When `is_being_synced` is set to `false`, these features will resume 
working, but the `is_being_synced` property is for CCR peripheral modules only 
and should not be manually set during CCR synchronization.  
+
+:::
+
+In the past, user had to set the number of buckets manually when creating 
table, but the automatic bucket feature is a way for Apache Doris to 
dynamically project the number of buckets, so that the number of buckets always 
stays within a suitable range and users don't have to worry about the minutiae 
of the number of buckets.
+
+For the sake of clarity, this section splits the bucket into two periods, the 
initial bucket and the subsequent bucket; the initial and subsequent are just 
terms used in this article to describe the feature clearly, there is no initial 
or subsequent Apache Doris bucket.
+
+As we know from the section above on creating buckets, `BUCKET_DESC` is very 
simple, but you need to specify the number of buckets; for the automatic bucket 
projection feature, the syntax of BUCKET_DESC directly changes the number of 
buckets to `Auto` and adds a new Properties configuration.
+
+```sql
+-- old version of the creation syntax for specifying the number of buckets
+DISTRIBUTED BY HASH(site) BUCKETS 20
+
+-- Newer versions use the creation syntax for automatic bucket imputation
+DISTRIBUTED BY HASH(site) BUCKETS AUTO
+properties("estimate_partition_size" = "100G")
+```
+
+The new configuration parameter estimate_partition_size indicates the amount 
of data for a single partition. This parameter is optional and if not given, 
Doris will take the default value of estimate_partition_size to 10GB.
+
+As you know from the above, a partitioned bucket is a Tablet at the physical 
level, and for best performance, it is recommended that the Tablet size be in 
the range of 1GB - 10GB. So how does the automatic bucketing projection ensure 
that the Tablet size is within this range? 
+
+To summarize, there are a few principles.
+
+- If the overall data volume is small, the number of buckets should not be set 
too high
+- If the overall data volume is large, the number of buckets should be related 
to the total number of disk blocks, so as to fully utilize the capacity of each 
BE machine and each disk
+
+### Initial bucketing projection
+
+1. Obtain a number of buckets N based on the data size. Initially, we divide 
the value of `estimate_partition_size` by 5 (considering a data compression 
ratio of 5 to 1 when storing data in text format in Doris). The result obtained 
is
+
+```
+(, 100MB), then take N=1
+
+[100MB, 1GB), then take N=2
+
+(1GB, ), then one bucket per GB
+```
+
+2. calculate the number of buckets M based on the number of BE nodes and the 
disk capacity of each BE node.
+
+```
+Where each BE node counts as 1, and every 50G of disk capacity counts as 1.
+
+The calculation rule for M is: M = Number of BE nodes * (Size of one disk 
block / 50GB) * Number of disk blocks.
+
+For example: If there are 3 BEs, and each BE has 4 disks of 500GB, then M = 3 
* (500GB / 50GB) * 4 = 120.
+
+```
+
+3. Calculation logic to get the final number of buckets.
+
+```
+Calculate an intermediate value x = min(M, N, 128).
+
+If x < N and x < the number of BE nodes, the final bucket is y.
+
+The number of BE nodes; otherwise, the final bucket is x.
+```
+
+4. x = max(x, autobucket_min_buckets), Here autobucket_min_buckets is 
configured in Config (where, default is 1)
+
+The pseudo-code representation of the above process is as follows
+
+```
+int N = Compute the N value;
+int M = compute M value;
+
+int y = number of BE nodes;
+int x = min(M, N, 128);
+
+if (x < N && x < y) {
+  return y;
+}
+return x;
+```
+
+With the above algorithm in mind, let's introduce some examples to better 
understand this part of the logic.
+
+```
+case1:
+Amount of data 100 MB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 1
+BE disks M = 10* (2TB/50GB) * 3 = 1230
+x = min(M, N, 128) = 1
+Final: 1
+
+case2:
+Data volume 1GB, 3 BE machines, 500GB * 2 disks
+Amount of data N = 2
+BE disks M = 3* (500GB/50GB) * 2 = 60
+x = min(M, N, 128) = 2
+Final: 2
+
+case3:
+Data volume 100GB, 3 BE machines, 500GB * 2 disks
+Amount of data N = 20
+BE disks M = 3* (500GB/50GB) * 2 = 60
+x = min(M, N, 128) = 20
+Final: 20
+
+case4:
+Data volume 500GB, 3 BE machines, 1TB * 1 disk
+Data volume N = 100
+BE disks M = 3* (1TB /50GB) * 1 = 60
+x = min(M, N, 128) = 63
+Final: 63
+
+case5:
+Data volume 500GB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 100
+BE disks M = 10* (2TB / 50GB) * 3 = 1230
+x = min(M, N, 128) = 100
+Final: 100
+
+case 6:
+Data volume 1TB, 10 BE machines, 2TB * 3 disks
+Amount of data N = 205
+BE disks M = 10* (2TB / 50GB) * 3 = 1230
+x = min(M, N, 128) = 128
+Final: 128
+
+case 7:
+Data volume 500GB, 1 BE machine, 100TB * 1 disk
+Amount of data N = 100
+BE disk M = 1* (100TB / 50GB) * 1 = 2048
+x = min(M, N, 128) = 100
+Final: 100
+
+case 8:
+Data volume 1TB, 200 BE machines, 4TB * 7 disks
+Amount of data N = 205
+BE disks M = 200* (4TB / 50GB) * 7 = 114800
+x = min(M, N, 128) = 128
+Final: 200
+```
+
+### Subsequent bucketing projection
+
+The above is the calculation logic for the initial bucketing. The subsequent 
bucketing can be evaluated based on the amount of partition data available 
since there is already a certain amount of partition data. The subsequent 
bucket size is evaluated based on the EMA[1] (short term exponential moving 
average) value of up to the first 7 partitions, which is used as the 
estimate_partition_size. At this point there are two ways to calculate the 
partition buckets, assuming partitioning by da [...]
+
+- If the partition data in 7 days is strictly increasing daily, then the trend 
value will be taken at this time. There are 6 delta values, which are
+
+```
+S7 - S6 = delta1,
+S6 - S5 = delta2,
+...
+S2 - S1 = delta6
+```
 
-In this example, the ENGINE is of OLAP type, which is the default ENGINE type. 
In Doris, only the OALP ENGINE type is managed and stored by Doris. Other 
ENGINE types, such as MySQL, Broker, ES, are essentially mappings to tables in 
other external databases or systems to ensure that Doris can read the data. And 
Doris itself does not create, manage, or store any tables and data of non-OLAP 
ENGINE type.
+This yields the ema(delta) value.Then, today's estimate_partition_size = S7 + 
ema(delta)
 
-### Other
+- not the first case, this time directly take the average of the previous days 
EMA. Today's estimate_partition_size = EMA(S1, ... , S7) , S7)
 
-`IF NOT EXISTS` means to create the table if it is non-existent. Note that the 
system only checks the existence of table based on the table name, but not 
compare the schema of the newly created table with the existing ones. So if 
there exists a table of the same name but different schema, the command will 
also return, but it does not mean that a new table of a new schema has been 
created.
+:::tip
 
-## FAQ
+According to the above algorithm, the initial number of buckets and the number 
of subsequent buckets can be calculated. Unlike before when only a fixed number 
of buckets could be specified, due to changes in business data, it is possible 
that the number of buckets in the previous partition is different from the 
number of buckets in the next partition, which is transparent to the user, and 
the user does not need to care about the exact number of buckets in each 
partition, and this automat [...]
 
-### Table Creation
+:::
 
-1. If a syntax error occurs in a long CREATE TABLE statement, the error 
message may be incomplete. Here is a list of possible syntax errors for your 
reference in manual touble shooting:
+## Common Issues
 
-   * Incorrect syntax. Please use `HELP CREATE TABLE;`to check the relevant 
syntax.
-   * Reserved words. Reserved words in user-defined names should be enclosed 
in backquotes ``. It is recommended that all user-defined names be enclosed in 
backquotes.
-   * Chinese characters or full-width characters. Non-UTF8 encoded Chinese 
characters, or hidden full-width characters (spaces, punctuation, etc.) can 
cause syntax errors. It is recommended that you check for these characters 
using a text editor that can display non-printable characters.
+1. Incomplete syntax error prompts may occur in longer table creation 
statements. Here are some possible syntax errors for manual troubleshooting:
 
-2. `Failed to create partition [xxx] . Timeout`
+   - Syntax structure errors. Please carefully read [HELP CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE)
  and check the relevant syntax structure.
+   - Reserved words. When user-defined names encounter reserved words, they 
need to be enclosed in backticks ``. It is recommended to use this symbol for 
all custom names.
+   - Chinese characters or full-width characters. Non-UTF8 encoded Chinese 
characters or hidden full-width characters (spaces, punctuation, etc.) can 
cause syntax errors. It is recommended to use a text editor that displays 
invisible characters for inspection.
 
-   In Doris, tables are created in the order of the partitioning granularity. 
This error prompt may appear when a partition creation task fails, but it could 
also appear in table creation tasks with no partitioning operations, because, 
as mentioned earlier, Doris will create an unmodifiable default partition for 
tables with no partitions specified.
+2. Failed to create partition [xxx]. Timeout
 
-   This error usually pops up because the tablet creation goes wrong in BE. 
You can follow the steps below for troubleshooting:
+   Doris creates tables sequentially based on partition granularity. When a 
partition fails to create, this error may occur. Even if partitions are not 
used, when there is a problem with table creation, `Failed to create partition` 
may still be reported because, as mentioned earlier, Doris creates an 
unmodifiable default partition for tables without specified partitions.
 
-   1. In fe.log, find the `Failed to create partition` log of the 
corresponding time point. In that log, find a number pair that looks like 
`{10001-10010}` . The first number of the pair is the Backend ID and the second 
number is the Tablet ID. As for `{10001-10010}`, it means that on Backend ID 
10001, the creation of Tablet ID 10010 failed.
-   2. After finding the target Backend, go to the corresponding be.INFO log 
and find the log of the target tablet, and then check the error message.
-   3. A few common tablet creation failures include but not limited to:
-      * The task is not received by BE. In this case, the tablet ID related 
information will be found in be.INFO, or the creation is successful in BE but 
it still reports a failure. To solve the above problems, see [Installation and 
Deployment](https://doris.apache.org/docs/dev/install/standard-deployment/) 
about how to check the connectivity of FE and BE.
-      * Pre-allocated memory failure. It may be that the length of a row in 
the table exceeds 100KB.
-      * `Too many open files`. The number of open file descriptors exceeds the 
Linux system limit. In this case, you need to change the open file descriptor 
limit of the Linux system.
+   When encountering this error, it is usually because the BE encountered a 
problem when creating data tablets. You can troubleshoot by following these 
steps:
 
-   If it is a timeout error, you can set `tablet_create_timeout_second=xxx` 
and `max_create_table_timeout_second=xxx` in fe.conf. The default value of 
`tablet_create_timeout_second=xxx` is 1 second, and that of 
`max_create_table_timeout_second=xxx`  is 60 seconds. The overall timeout would 
be min(tablet_create_timeout_second * replication_num, 
max_create_table_timeout_second). For detailed parameter settings, please check 
[FE Configuration](https://doris.apache.org/docs/dev/admin-manual/ [...]
+   - In the fe.log, search for the `Failed to create partition` log entry at 
the corresponding timestamp. In this log entry, you may find a series of number 
pairs similar to `{10001-10010}`. The first number in the pair represents the 
Backend ID, and the second number represents the Tablet ID. For example, this 
number pair indicates that the creation of Tablet ID 10010 on Backend ID 10001 
failed.  
+   - Go to the be.INFO log of the corresponding Backend and search for tablet 
ID-related logs within the corresponding time period to find error messages.  
+   - Here are some common tablet creation failure errors, including but not 
limited to:  
+     - The BE did not receive the relevant task. In this case, you cannot find 
tablet ID-related logs in be.INFO or the BE reports success but actually fails. 
For these issues, please refer to the [Installation and 
Deployment](../install/cluster-deployment/standard-deployment) section to check 
the connectivity between FE and BE.  
+     - Pre-allocated memory failure. This may be because the byte length of a 
row in the table exceeds 100KB.  
+     - `Too many open files`. The number of open file handles exceeds the 
Linux system limit. You need to modify the handle limit of the Linux system.  
 
-3. The build table command does not return results for a long time.
+* If there is a timeout when creating data tablets, you can also extend the 
timeout by setting `tablet_create_timeout_second=xxx` and 
`max_create_table_timeout_second=xxx` in the fe.conf file. By default, 
`tablet_create_timeout_second` is set to 1 second, and 
`max_create_table_timeout_second` is set to 60 seconds. The overall timeout is 
calculated as `min(tablet_create_timeout_second * replication_num, 
max_create_table_timeout_second)`. For specific parameter settings, please 
refer to th [...]
 
-   Doris's table creation command is a synchronous command. The timeout of 
this command is currently set to be relatively simple, ie (tablet num * 
replication num) seconds. If you create more data fragments and have fragment 
creation failed, it may cause an error to be returned after waiting for a long 
timeout.
+3. The table creation command does not return results for a long time.
 
-   Under normal circumstances, the statement will return in a few seconds or 
ten seconds. If it is more than one minute, it is recommended to cancel this 
operation directly and go to the FE or BE log to view the related errors.
+* Doris's table creation command is a synchronous command. The timeout for 
this command is currently set simply as (tablet num * replication num) seconds. 
If many data tablets are created and some of them fail to create, it may result 
in a long wait before returning an error.  
+* Under normal circumstances, the table creation statement should return 
within a few seconds or tens of seconds. If it exceeds one minute, it is 
recommended to cancel the operation directly and check the relevant errors in 
the FE or BE logs.
 
-## More Help
+## Additional Help
 
-For more detailed instructions on data partitioning, please refer to the 
[CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE.md)
 command manual, or enter `HELP CREATE TABLE;` in MySQL Client.
+For more detailed information on data partitioning, you can refer to the 
[CREATE 
TABLE](../sql-manual/sql-reference/Data-Definition-Statements/Create/CREATE-TABLE)
 command manual or enter `HELP CREATE TABLE;` in the MySQL client to get more 
help information.
\ No newline at end of file


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [doc]update en docs of data partition (#475)

Reply via email to