(doris-website) branch master updated: [docs](doc)Update en doc (#461)

luzhijing Fri, 22 Mar 2024 23:49:08 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 13bcd109472d [docs](doc)Update en doc (#461)
13bcd109472d is described below

commit 13bcd109472d8967882fd2bd0f1d8f01230a8841
Author: Hu Yanjun <[email protected]>
AuthorDate: Sat Mar 23 14:48:57 2024 +0800

    [docs](doc)Update en doc (#461)
---
 static/images/aggregate-key-model-example.png      | Bin 0 -> 355467 bytes
 static/images/create-table-example.png             | Bin 0 -> 390782 bytes
 static/images/duplicate-key-model-example.png      | Bin 0 -> 372902 bytes
 .../version-2.0/admin-manual/data-admin/ccr.md     |   4 +-
 .../admin-manual/query-admin/kill-query.md         |  61 ++-
 .../data-operate/delete/atomicity-replace.md       |  43 +-
 .../data-operate/delete/truncate-manual.md         |  32 +-
 .../data-operate/update/aggregate-load-update.md   |  60 ++-
 .../version-2.0/db-connect/database-connect.md     |   6 +-
 .../cluster-deployment/standard-deployment.md      |  48 +-
 .../install/source-install/compilation-arm.md      |   4 +-
 .../install/source-install/compilation-mac.md      |  12 +-
 .../compilation-with-ldb-toolchain.md              |  12 +-
 .../version-2.0/lakehouse/datalake/hive.md         | 525 ++++++++++++---------
 .../version-2.0/table-design/best-practice.md      | 483 +++++++++++++++----
 15 files changed, 937 insertions(+), 353 deletions(-)

diff --git a/static/images/aggregate-key-model-example.png 
b/static/images/aggregate-key-model-example.png
new file mode 100644
index 000000000000..2474a1bcc938
Binary files /dev/null and b/static/images/aggregate-key-model-example.png 
differ
diff --git a/static/images/create-table-example.png 
b/static/images/create-table-example.png
new file mode 100644
index 000000000000..469c79842ec1
Binary files /dev/null and b/static/images/create-table-example.png differ
diff --git a/static/images/duplicate-key-model-example.png 
b/static/images/duplicate-key-model-example.png
new file mode 100644
index 000000000000..3935adb30410
Binary files /dev/null and b/static/images/duplicate-key-model-example.png 
differ
diff --git a/versioned_docs/version-2.0/admin-manual/data-admin/ccr.md 
b/versioned_docs/version-2.0/admin-manual/data-admin/ccr.md
index 7f16c6822bb2..3bd8e54a836f 100644
--- a/versioned_docs/version-2.0/admin-manual/data-admin/ccr.md
+++ b/versioned_docs/version-2.0/admin-manual/data-admin/ccr.md
@@ -562,7 +562,9 @@ download_binlog_rate_limit_kbs=1024 # This configuration 
limits the size to 1MB.
 
 ## IS_BEING_SYNCED
 
-:::tip Doris v2.0 "is_being_synced" = "true" :::
+:::tip 
+Doris v2.0 "is_being_synced" = "true" 
+:::
 
 During data synchronization using CCR, replica tables (referred to as target 
tables) are created in the target cluster for the tables within the 
synchronization scope of the source cluster (referred to as source tables). 
However, certain functionalities and attributes need to be disabled or cleared 
when creating replica tables to ensure the correctness of the synchronization 
process. For example:
 
diff --git a/versioned_docs/version-2.0/admin-manual/query-admin/kill-query.md 
b/versioned_docs/version-2.0/admin-manual/query-admin/kill-query.md
index 532ac3e62629..0e0ec8ecddd4 100644
--- a/versioned_docs/version-2.0/admin-manual/query-admin/kill-query.md
+++ b/versioned_docs/version-2.0/admin-manual/query-admin/kill-query.md
@@ -24,4 +24,63 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
\ No newline at end of file
+# Kill Query
+## Kill connection
+
+In Doris, each connection runs in a separate thread. You can terminate a 
thread using the `KILL processlist_id`statement.
+
+The `processlist_id` for the thread can be found in the Id column from the 
SHOW PROCESSLIST output. Or you can use the `SELECT CONNECTION_ID()` command to 
query the current connection id.
+
+Syntax:
+
+```SQL
+KILL [CONNECTION] processlist_id
+```
+
+## Kill query
+
+You can also terminate the query command under execution based on the 
processlist_id or the query_id.
+
+Syntax:
+
+```SQL
+KILL QUERY processlist_id | query_id
+```
+
+## Example
+
+1. Check the current connection id.
+
+```SQL
+mysql select connection_id();
++-----------------+
+| connection_id() |
++-----------------+
+| 48              |
++-----------------+
+1 row in set (0.00 sec)
+```
+
+2. Check all connection id.
+
+```SQL
+mysql SHOW PROCESSLIST;
++------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+
+| CurrentConnected | Id   | User | Host               | LoginTime           | 
Catalog  | Db      | Command | Time | State | QueryId                           
| Info                                                                          
        |
++------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+
+| Yes              |   48 | root | 10.16.xx.xx:44834   | 2023-12-29 16:49:47 | 
internal | test | Query   |    0 | OK    | e6e4ce9567b04859-8eeab8d6b5513e38 | 
SHOW PROCESSLIST                                                                
      |
+|                  |   50 | root | 192.168.xx.xx:52837 | 2023-12-29 16:51:34 | 
internal |      | Sleep   | 1837 | EOF   | deaf13c52b3b4a3b-b25e8254b50ff8cb | 
SELECT @@session.transaction_isolation                                          
      |
+|                  |   51 | root | 192.168.xx.xx:52843 | 2023-12-29 16:51:35 | 
internal |      | Sleep   |  907 | EOF   | 437f219addc0404f-9befe7f6acf9a700 | 
/* ApplicationName=DBeaver Ultimate 23.1.3 - Metadata */ SHOW STATUS            
      |
+|                  |   55 | root | 192.168.xx.xx:55533 | 2023-12-29 17:09:32 | 
internal | test | Sleep   |  271 | EOF   | f02603dc163a4da3-beebbb5d1ced760c | 
/* ApplicationName=DBeaver Ultimate 23.1.3 - SQLEditor <Console> */ SELECT 
DATABASE() |
+|                  |   47 | root | 10.16.xx.xx:35678   | 2023-12-29 16:21:56 | 
internal | test | Sleep   | 3528 | EOF   | f4944c543dc34a99-b0d0f3986c8f1c98 | 
select * from test                                                              
      |
++------------------+------+------+--------------------+---------------------+----------+---------+---------+------+-------+-----------------------------------+---------------------------------------------------------------------------------------+
+5 rows in set (0.00 sec)
+```
+
+3. Kill the currently running query, which will then be displayed as canceled.
+
+```SQL
+mysql kill query 55;
+Query OK, 0 rows affected (0.01 sec)
+```
+
diff --git 
a/versioned_docs/version-2.0/data-operate/delete/atomicity-replace.md 
b/versioned_docs/version-2.0/data-operate/delete/atomicity-replace.md
index a3da6b10cb0c..d3a2b9a867fc 100644
--- a/versioned_docs/version-2.0/data-operate/delete/atomicity-replace.md
+++ b/versioned_docs/version-2.0/data-operate/delete/atomicity-replace.md
@@ -24,4 +24,45 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
\ No newline at end of file
+# Atomicity Replace
+
+Doris supports atomic table replacement operations for two tables. This is 
only applicable to OLAP tables.
+
+## Applicable scenarios
+
+- Atomic overwrite operations
+- In certain cases, users may want to rewrite data in a table. However, the 
"delete and load" approach causes a data invisibility window. To solve that, 
Doris allows users to create a new table of the same schema using the CREATE 
TABLE LIKE statement, import the new data into this new table, and then 
atomically replace the old table with the new table. For atomic replacement at 
the partition level, please refer to the [temporary 
partition](https://doris.apache.org/docs/2.0/data-operate/d [...]
+
+## Syntax
+
+```Plain
+ALTER TABLE [db.]tbl1 REPLACE WITH TABLE tbl2
+[PROPERTIES('swap' = 'true')];
+```
+
+Replace table tbl1 with table tbl2.
+
+If `swap` is `true`, after the replacement, data in `tbl1` will be replaced by 
that in `tbl2`, while data in `tbl2` will be replaced by that in `tbl1`. In 
other words, the two tables will swap data.
+
+If `swap` is `false`, after the replacement, data in `tbl1` will be replaced 
by that in `tbl2` and `tbl2` will be deleted.
+
+## Implementation
+
+In fact, table replacement is to combine the following operations into one 
atomic operation.
+
+Assuming that table A is to be replaced with table B, and `swap` is set to 
`true`. The operations to be implemented are as follows:
+
+1. Rename table B to table A.
+2. Rename table A to table B.
+
+If `swap` is set to `false`, the operations are as follows:
+
+1. Delete table A.
+2. Rename table B to table A.
+
+## Note
+
+- `swap` defaults to `true`, meaning to swap the data between two tables.
+- If `swap` is set to `false`, the table being replaced (table A) will be 
deleted and cannot be recovered.
+- The replacement operation can only be implemented between two OLAP tables 
and it does not check for table schema consistency.
+- The replacement operation does not change the existing privilege settings 
because privilege checks are based on table names.
diff --git a/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md 
b/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md
index 9debf3729078..a02bd205f244 100644
--- a/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md
+++ b/versioned_docs/version-2.0/data-operate/delete/truncate-manual.md
@@ -24,4 +24,34 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming Soon
\ No newline at end of file
+# Truncate
+
+This statement is used to clear the data of a specified table or partition in 
Doris
+
+## Syntax
+
+```SQL
+TRUNCATE TABLE [db.]tbl[ PARTITION(p1, p2, ...)];
+```
+
+- This statement only clears the data within a table or partition but 
preserves the table or partition itself.
+- Unlike DELETE, this statement can only clear the specified table or 
partition as a whole and cannot be added with filter conditions.
+- Unlike DELETE, truncating data will not affect query performance.
+- The data deleted by this operation is not recoverable.
+- When using this command, the table status must be NORMAL, which means that 
tables undergoing SCHEMA CHANGE can not be truncated.
+- This command may cause ongoing imports to fail.
+
+## Example
+
+1. Truncate the table `tbl` under `example_db`.
+
+```SQL
+TRUNCATE TABLE example_db.tbl;
+```
+
+2. Truncate partitions `p1` and `p2` of table `tbl`.
+
+```SQL
+TRUNCATE TABLE tbl PARTITION(p1, p2);
+```
+
diff --git 
a/versioned_docs/version-2.0/data-operate/update/aggregate-load-update.md 
b/versioned_docs/version-2.0/data-operate/update/aggregate-load-update.md
index 1514201f4190..e66af6ae004d 100644
--- a/versioned_docs/version-2.0/data-operate/update/aggregate-load-update.md
+++ b/versioned_docs/version-2.0/data-operate/update/aggregate-load-update.md
@@ -24,4 +24,62 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-Coming soon
\ No newline at end of file
+# Update for Aggregate Load
+
+This guide is about ingestion-based data updates for the Aggregate Key model 
in Doris.
+
+## Update all columns
+
+When importing data into an Aggregate Key model in Doris by methods like 
Stream Load, Broker Load, Routine Load, and Insert Into, the new values are 
combined with the old values to produce new aggregated values based on the 
column's aggregation function. These values might be generated during insertion 
or produced asynchronously during compaction. However, when querying, users 
will always receive the same returned values.
+
+## Partial column update for Aggregate Key model
+
+Tables in the Aggregate Key model are primarily used in cases with 
pre-aggregation requirements rather than data updates, but Doris allows partial 
column updates for them, too. Simply set the aggregation function to 
`REPLACE_IF_NOT_NULL`.
+
+**Create table**
+
+For the columns that need to be updated, set the aggregation function to 
`REPLACE_IF_NOT_NULL`.
+
+```Plain
+CREATE TABLE order_tbl (
+  order_id int(11) NULL,
+  order_amount int(11) REPLACE_IF_NOT_NULL NULL,
+  order_status varchar(100) REPLACE_IF_NOT_NULL NULL
+) ENGINE=OLAP
+AGGREGATE KEY(order_id)
+COMMENT 'OLAP'
+DISTRIBUTED BY HASH(order_id) BUCKETS 1
+PROPERTIES (
+"replication_allocation" = "tag.location.default: 1"
+);
++----------+--------------+-----------------+
+| order_id | order_amount | order_status    |
++----------+--------------+-----------------+
+| 1        |          100 | Pending Payment |
++----------+--------------+-----------------+
+1 row in set (0.01 sec)
+```
+
+**Ingest data**
+
+For Stream Load, Broker Load, Routine Load, or INSERT INTO, you can directly 
write the updates to the fields.
+
+**Example**
+
+Using the same example as above, the corresponding Stream Load command would 
be (no additional headers required):
+
+```Plain
+curl  --location-trusted -u root: -H "column_separator:," -H 
"columns:order_id,order_status" -T /tmp/update.csv 
http://127.0.0.1:48037/api/db1/order_tbl/_stream_load
+```
+
+The corresponding `INSERT INTO` statement would be (no additional session 
variables required):
+
+```Plain
+INSERT INTO order_tbl (order_id, order_status) values (1,'Delivery Pending');
+```
+
+## Note
+
+The Aggregate Key model does not perform additional data processing during 
data writing, so the writing performance in this model is the same as other 
models. However, aggregation during queries can result in performance loss. 
Typical aggregation queries can be 5~10 times slower than queries on 
Merge-on-Write tables in the Unique Key model.
+
+Under this circumstance, users cannot set a field from non-NULL to NULL, 
because NULL values written will be automatically neglected by the 
REPLACE_IF_NOT_NULL aggregation function.
diff --git a/versioned_docs/version-2.0/db-connect/database-connect.md 
b/versioned_docs/version-2.0/db-connect/database-connect.md
index 8aa246e2b0f6..1b5bd7cb4ede 100644
--- a/versioned_docs/version-2.0/db-connect/database-connect.md
+++ b/versioned_docs/version-2.0/db-connect/database-connect.md
@@ -101,6 +101,8 @@ ALTER SYSTEM ADD BACKEND 
"be_host_ip:heartbeat_service_port";
 
 
![Doris-Web-UI-Playground-en](../../../static/images/Doris-Web-UI-Playground-en.png)
 
-:::tip For successful execution of statements that are not related to specific 
databases/tables in the Playground, it is necessary to randomly select a 
database from the left-hand database panel. This limitation will be removed 
later.
+:::tip 
+For successful execution of statements that are not related to specific 
databases/tables in the Playground, it is necessary to randomly select a 
database from the left-hand database panel. This limitation will be removed 
later.
 
-The current built-in web console cannot execute SET type SQL statements. 
Therefore, the web console does not support statements like SET PASSWORD FOR 
'user' = PASSWORD('user_password'). :::
+The current built-in web console cannot execute SET type SQL statements. 
Therefore, the web console does not support statements like SET PASSWORD FOR 
'user' = PASSWORD('user_password'). 
+:::
diff --git 
a/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md 
b/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md
index 6b6ea3fa0501..7536e8117180 100644
--- 
a/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md
+++ 
b/versioned_docs/version-2.0/install/cluster-deployment/standard-deployment.md
@@ -83,13 +83,14 @@ Doris can run on servers with x86-64 architecture or ARM64 
architecture.
 | Frontend | 8+ Cores | 8+ GB  | SSD or SATA, 10+ GB | 1 or 10 GbE | 1         
         |
 | Backend  | 8+ Cores | 16+ GB | SSD or SATA, 50+ GB | 1 or 10 GbE | 1         
         |
 
-:::tip Description:
-
+:::tip 
+Description:
 - In a validation testing environment, you can deploy the FE and BE on the 
same server.
 - It is generally recommended to deploy only **one BE instance on a single 
machine**. Meanwhile, you can only deploy **one** **FE** **on one machine**.
 - If you need three replicas of data, you will need at least three machines 
and deploy one BE instance on each of them, instead of deploying three BE 
instances on one single machine.
 - **The clocks of multiple servers hosting FEs must be synchronized, allowing 
a maximum clock deviation of 5 seconds.**
-- In a testing environment, you can also perform tests with just one BE. In 
the actual production environment, the number of BE instances directly affects 
the overall query latency. :::
+- In a testing environment, you can also perform tests with just one BE. In 
the actual production environment, the number of BE instances directly affects 
the overall query latency.
+:::
 
 **Production environment**
 
@@ -98,11 +99,12 @@ Doris can run on servers with x86-64 architecture or ARM64 
architecture.
 | Frontend | 16+ Cores | 64+ GB | SSD or RAID, 100+ GB | 10 GbE  | 1           
       |
 | Backend  | 16+ Cores | 64+ GB | SSD or SATA, 100+ GB | 10 GbE  | 3           
       |
 
-:::tip Description:
-
+:::tip 
+Description:
 - In a production environment, if the FE and BE are co-located, be mindful of 
resource contention issues. It is recommended to store metadata and data on 
separate disks.
 - BE nodes can be configured with multiple disks for storage. You can bind 
multiple HDDs or SSDs to a single BE instance.
-- The performance of the cluster is dependent on the resources of the BE 
nodes. The more BE nodes there are, the better the performance of Doris. 
Typically, Doris can fully leverage its performance potential with 10 to 100 
machines. :::
+- The performance of the cluster is dependent on the resources of the BE 
nodes. The more BE nodes there are, the better the performance of Doris. 
Typically, Doris can fully leverage its performance potential with 10 to 100 
machines.
+:::
 
 ### Hard disk space calculation
 
@@ -137,7 +139,9 @@ tmpfs                  /tmp          tmpfs     nodev,nosuid 
         0      0
 /dev/sda3              /home         ext4      defaults,noatime      0      2
 ```
 
-:::caution It is not recommended to disable swap by setting vm.swappiness = 0 
because this parameter has different semantics in different Linux kernel 
versions. In many cases, this will not completely disable swap. :::
+:::caution 
+It is not recommended to disable swap by setting vm.swappiness = 0 because 
this parameter has different semantics in different Linux kernel versions. In 
many cases, this will not completely disable swap. 
+:::
 
 ### Check and disable system firewall
 
@@ -169,7 +173,9 @@ vi /etc/security/limits.conf
 * hard nofile 1000000
 ```
 
-:::caution For the changes to take effect, the current user needs to log out 
of the current session and log back in. :::
+:::caution 
+For the changes to take effect, the current user needs to log out of the 
current session and log back in. 
+:::
 
 ### Modify the number of virtual memory areas
 
@@ -326,10 +332,10 @@ ALTER SYSTEM ADD OBSERVER 
"<fe_ip_address>:<fe_edit_log_port>"
 ```
 
 :::note
-
 1. It is recommended to have an odd number of FE Follower nodes (including the 
Master node), with a suggested number of three to achieve high availability.
 2. When FE is deployed in a high availability mode (1 Master node, 2 Follower 
nodes), it is advisable to increase the scalability of the FE read services by 
adding an Observer FE.
-3. Typically, one FE node can handle around 10-20 BE nodes. It is recommended 
to keep the total number of FE nodes below 10. :::
+3. Typically, one FE node can handle around 10-20 BE nodes. It is recommended 
to keep the total number of FE nodes below 10.
+:::
 
 **Start** **FE** **Follower node**
 
@@ -387,13 +393,13 @@ 
storage_root_path=/home/disk1/doris,medium:HDD;/home/disk2/doris,medium:SSD
 ```
 
 :::note
-
 1. When specifying the storage type for the storage path, at least one path 
should have the storage type set as HDD.
 2. If the storage type for a storage path is not explicitly declared, it 
defaults to HDD.
 3. Specifying the storage type as HDD or SSD is to differentiate the storage 
types for the paths. It is independent of the physical storage medium. For 
example, you can designate a directory on an HDD disk as SSD.
-4. The keywords for storage types, HDD and SSD, must be capitalized. :::
+4. The keywords for storage types, HDD and SSD, must be capitalized.
+:::
 
-5. Bind cluster IP
+6. Bind cluster IP
 
 In a multi-network interface environment, it is necessary to explicitly 
configure the "priority_networks". However, in an environment where FQDN is 
configured, the "priority_networks" can be ignored.
 
@@ -534,17 +540,21 @@ Doris processes listen to network segments represented in 
CIDR format for IP add
 
 This parameter is mainly used to help the system select the correct network 
card IP as its listening IP. For example, if the required listening IP is 
192.168.0.1, you can set `priority_networks=192.168.0.0/24`. The system will 
automatically scan all IPs on the machine, and only those matching the 
192.168.0.0/24 network segment will be used as service listening addresses. You 
can also configure multiple CIDR network segments in this parameter, such as 
`priority_networks = 10.10.0.0/16; 19 [...]
 
-:::tip **Why use priority_networks to configure listening address segments 
instead of directly setting the listening IP address in the configuration 
file?**
+:::tip 
+**Why use priority_networks to configure listening address segments instead of 
directly setting the listening IP address in the configuration file?**
 
-The main reason is that Doris is a distributed cluster, and the same 
configuration file will be deployed on multiple nodes. To facilitate 
deployment, updates, and maintenance, it is desirable to keep all nodes' 
configuration files consistent. By configuring the listening address segment 
and then starting the system, the appropriate listening IP can be found based 
on this network segment, thus each machine can use a single value in this 
configuration. :::
+The main reason is that Doris is a distributed cluster, and the same 
configuration file will be deployed on multiple nodes. To facilitate 
deployment, updates, and maintenance, it is desirable to keep all nodes' 
configuration files consistent. By configuring the listening address segment 
and then starting the system, the appropriate listening IP can be found based 
on this network segment, thus each machine can use a single value in this 
configuration. 
+:::
 
 ### Why do new BE nodes need to be manually added to the cluster?
 
 After the BE node is started, you need to send a command to the FE through the 
MySQL protocol or the built-in web console to join the BE node to the cluster.
 
-:::tip **How does the FE know which BE nodes constitute the cluster?**
+:::tip 
+**How does the FE know which BE nodes constitute the cluster?**
 
-As a distributed database, Doris generally has many BE nodes. Doris adds BE 
nodes to the cluster by sending commands to the FE. This is different from the 
way how BE nodes know the addresses of FE nodes and then actively report 
connections. The way of manual adding and having the FE actively connect to BE 
nodes is beneficial to cluster management in many aspects. For example, it can 
determine which nodes constitute the cluster and it can proactively shut down a 
BE node that cannot be con [...]
+As a distributed database, Doris generally has many BE nodes. Doris adds BE 
nodes to the cluster by sending commands to the FE. This is different from the 
way how BE nodes know the addresses of FE nodes and then actively report 
connections. The way of manual adding and having the FE actively connect to BE 
nodes is beneficial to cluster management in many aspects. For example, it can 
determine which nodes constitute the cluster and it can proactively shut down a 
BE node that cannot be con [...]
+:::
 
 ### How to quickly detect whether the FE has started successfully?
 
@@ -574,7 +584,9 @@ ALTER SYSTEM ADD BACKEND 
"be_host_ip:heartbeat_service_port";
 
 
![Doris-Web-UI-Playground-en](../../../../static/images/Doris-Web-UI-Playground-en.png)
 
-:::tip For successful execution of statements that are not related to specific 
databases/tables in the Playground, it is necessary to randomly select a 
database from the left-hand database panel. This limitation will be removed 
later. :::well-log-
+:::tip 
+For successful execution of statements that are not related to specific 
databases/tables in the Playground, it is necessary to randomly select a 
database from the left-hand database panel. This limitation will be removed 
later. 
+:::
 
 ### Why can't I change the root password via the Web UI?
 
diff --git 
a/versioned_docs/version-2.0/install/source-install/compilation-arm.md 
b/versioned_docs/version-2.0/install/source-install/compilation-arm.md
index 0c45db114453..d60a711f546f 100644
--- a/versioned_docs/version-2.0/install/source-install/compilation-arm.md
+++ b/versioned_docs/version-2.0/install/source-install/compilation-arm.md
@@ -250,7 +250,9 @@ gcc --version
 
 ## Compile
 
-:::tip It is recommended to use LDB Toolchain for compilation with ARM 
environment. :::
+:::tip 
+It is recommended to use LDB Toolchain for compilation with ARM environment. 
+:::
 
 When compiling Doris on an ARM platform, please disable the AVX2 and LIBUNWIND 
third-party libraries.
 
diff --git 
a/versioned_docs/version-2.0/install/source-install/compilation-mac.md 
b/versioned_docs/version-2.0/install/source-install/compilation-mac.md
index e0493452476a..0a3c7a0b1bb7 100644
--- a/versioned_docs/version-2.0/install/source-install/compilation-mac.md
+++ b/versioned_docs/version-2.0/install/source-install/compilation-mac.md
@@ -42,7 +42,9 @@ brew install automake autoconf libtool pkg-config texinfo 
coreutils gnu-getopt \
 python@3 cmake ninja ccache bison byacc gettext wget pcre maven llvm@16 
openjdk@11 npm
 ```
 
-:::tip On MacOS, since Homebrew does not provide an installation package for 
JDK8, JDK11 is used here instead. Alternatively, you can manually download and 
install JDK8. :::
+:::tip 
+On MacOS, since Homebrew does not provide an installation package for JDK8, 
JDK11 is used here instead. Alternatively, you can manually download and 
install JDK8. 
+:::
 
 2. **Compile source code**
 
@@ -50,7 +52,9 @@ python@3 cmake ninja ccache bison byacc gettext wget pcre 
maven llvm@16 openjdk@
 bash build.sh
 ```
 
-:::tip The first step of compiling Doris is to download and compile 
third-party libraries. You can download the pre-compiled versions of 
third-party libraries provided by the Doris community. Please refer to the 
instructions below for downloading **pre-compiled third-party libraries** to 
speed up the build process. :::
+:::tip 
+The first step of compiling Doris is to download and compile third-party 
libraries. You can download the pre-compiled versions of third-party libraries 
provided by the Doris community. Please refer to the instructions below for 
downloading **pre-compiled third-party libraries** to speed up the build 
process. 
+:::
 
 ## Start
 
@@ -107,4 +111,6 @@ cd installed/bin
 ./thrift --version
 ```
 
-:::tip When running protoc and thrift, you may encounter an issue where the 
binary cannot be opened due to developer verification. To resolve this, you can 
go to "Security & Privacy" settings. In the "General" tab, click on the "Open 
Anyway" button to confirm your intent to open the binary. Refer to: 
https://support.apple.com/en-us/102445 :::
+:::tip 
+When running protoc and thrift, you may encounter an issue where the binary 
cannot be opened due to developer verification. To resolve this, you can go to 
"Security & Privacy" settings. In the "General" tab, click on the "Open Anyway" 
button to confirm your intent to open the binary. Refer to: 
https://support.apple.com/en-us/102445 
+:::
diff --git 
a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
 
b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
index aa3dd0b87612..7526fe78514c 100644
--- 
a/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
+++ 
b/versioned_docs/version-2.0/install/source-install/compilation-with-ldb-toolchain.md
@@ -28,9 +28,11 @@ under the License.
 
 This guide is about how to compile Doris using the LDB Toolchain. This method 
serves as a supplement to the Docker compilation approach to help the 
developers and users without a Docker environment. The recommended LDB 
Toolchain version is 0.17, which includes clang-16 and gcc-11.
 
-:::tip LDB Toolchain is fully known as Linux Distribution Based Toolchain 
Generator. It helps compile modern C++ projects on almost all Linux 
distributions. :::
+:::tip 
+LDB Toolchain is fully known as Linux Distribution Based Toolchain Generator. 
It helps compile modern C++ projects on almost all Linux distributions. 
 
 Special thanks to [Amos Bird](https://github.com/amosbird) for the 
contribution.
+:::
 
 ## Prepare the compilation environment:
 
@@ -40,7 +42,9 @@ This method applies to most Linux distributions (CentOS, 
Ubuntu, etc.).
 
 Download the latest `ldb_toolchain_gen.sh` from 
[here](https://github.com/amosbird/ldb_toolchain_gen/releases). This script is 
used to generate ldb toolchain.
 
-:::tip For more information, please visit 
https://github.com/amosbird/ldb_toolchain_gen :::
+:::tip 
+For more information, please visit 
https://github.com/amosbird/ldb_toolchain_gen 
+:::
 
 2. **Execute the following command to generate ldb toolchain.**
 
@@ -105,7 +109,9 @@ export PATH=/path/to/ldb_toolchain/bin:$PATH
 
 ## Compile Doris
 
-:::tip The first step of compiling Doris source code is to first download 
third-party libraries and compile them. You can refer to the following 
instructions to download precompiled versions of the third-party libraries. :::
+:::tip 
+The first step of compiling Doris source code is to first download third-party 
libraries and compile them. You can refer to the following instructions to 
download precompiled versions of the third-party libraries. 
+:::
 
 1. **Enter the Doris source code directory and execute the following command 
to check if the compilation machine supports the AVX2 instruction set.**
 
diff --git a/versioned_docs/version-2.0/lakehouse/datalake/hive.md 
b/versioned_docs/version-2.0/lakehouse/datalake/hive.md
index 4724d0e903b2..81450b2b7ff3 100644
--- a/versioned_docs/version-2.0/lakehouse/datalake/hive.md
+++ b/versioned_docs/version-2.0/lakehouse/datalake/hive.md
@@ -26,22 +26,22 @@ under the License.
 
 # Hive
 
-By connecting to Hive Metastore, or a metadata service compatible with Hive 
Metatore, Doris can automatically obtain Hive database table information and 
perform data queries.
+By connecting to Hive Metastore, Doris can automatically obtain database and 
table information from Hive and perform data queries.
 
-In addition to Hive, many other systems also use the Hive Metastore to store 
metadata. So through Hive Catalog, we can not only access Hive, but also access 
systems that use Hive Metastore as metadata storage. Such as Iceberg, Hudi, etc.
+In addition to Hive, systems like Iceberg and Hudi also use Hive Metastore for 
metadata storage. Hive Catalog in Doris allows users to easily integrate with 
not only Hive but also systems that use Hive Metastore as metadata storage.
 
-## Terms and Conditions
+## Note
 
-1. Need to put core-site.xml, hdfs-site.xml and hive-site.xml in the conf 
directory of FE and BE. First read the hadoop configuration file in the conf 
directory, and then read the related to the environment variable 
`HADOOP_CONF_DIR` configuration file.
-2. hive supports version 1/2/3.
-3. Support Managed Table and External Table and part of Hive View.
-4. Can identify hive, iceberg, hudi metadata stored in Hive Metastore.
+- Hive 1, Hive 2, and Hive 3 are supported.
+- Managed Tables, External Tables, and part of the Hive Views are supported.
+- It can recognize Hive, Iceberg, and Hudi metadata stored in the Hive 
Metastore.
+- You should place core-site.xml, hdfs-site.xml, and hive-site.xml in the conf 
directory of both FE and BE. During reading, the hadoop configuration files in 
the conf directory are read first, followed by the configuration files related 
to the HADOOP_CONF_DIR environment variable.
 
 ## Create Catalog
 
-### Hive On HDFS
+### Hive on HDFS
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -54,11 +54,11 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-In addition to the two required parameters of `type` and 
`hive.metastore.uris`, more parameters can be passed to pass the information 
required for the connection.
+In addition to the two required parameters of `type` and 
`hive.metastore.uris`, you can specify more parameters to pass the required 
information for the connection.
 
-If HDFS HA information is provided, the example is as follows:
+Example to provide HDFS HA information:
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -71,33 +71,9 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-Provide HDFS HA information and Kerberos authentication information at the 
same time, examples are as follows:
+### Hive on VIEWFS
 
-```sql
-CREATE CATALOG hive PROPERTIES (
-    'type'='hms',
-    'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
-    'hive.metastore.sasl.enabled' = 'true',
-    'hive.metastore.kerberos.principal' = 'your-hms-principal',
-    'dfs.nameservices'='your-nameservice',
-    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
-    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:8088',
-    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:8088',
-    
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
-    'hadoop.security.authentication' = 'kerberos',
-    'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',   
-    'hadoop.kerberos.principal' = '[email protected]',
-    'yarn.resourcemanager.principal' = 'your-rm-principal'
-);
-```
-
-Please place the `krb5.conf` file and `keytab` authentication file under all 
`BE` and `FE` nodes. The path of the `keytab` authentication file is consistent 
with the configuration. The `krb5.conf` file is placed in `/etc by default 
/krb5.conf` path.
-
-The value of `hive.metastore.kerberos.principal` needs to be consistent with 
the property of the same name of the connected hive metastore, which can be 
obtained from `hive-site.xml`.
-
-### Hive On VIEWFS
-
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -113,17 +89,17 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-viewfs related parameters can be added to the catalog configuration as above, 
or added to `conf/core-site.xml`.
+You can add viewfs-related parameters to the catalog configuration as above, 
or add them to `conf/core-site.xml`.
 
-How viewfs works and parameter configuration, please refer to relevant hadoop 
documents, for example, 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ViewFs.html
+For details about how viewfs works and the parameter configurations, please 
refer to relevant hadoop documents, such as 
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/ViewFs.html
 
-### Hive On JuiceFS
+### Hive on JuiceFS
 
-Data is stored in JuiceFS, examples are as follows:
+Example for data stored on JuiceFS:
 
-(Need to put `juicefs-hadoop-x.x.x.jar` under `fe/lib/` and 
`apache_hdfs_broker/lib/`)
+(You should put `juicefs-hadoop-x.x.x.jar` under `fe/lib/` and 
`apache_hdfs_broker/lib/`.)
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -134,9 +110,9 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-### Hive On S3
+### Hive on S3
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     "type"="hms",
     "hive.metastore.uris" = "thrift://172.0.0.1:9083",
@@ -148,15 +124,15 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-Options:
+Optional properties:
 
-* s3.connection.maximum: s3 maximum connection number, default 50
-* s3.connection.request.timeout: s3 request timeout, default 3000ms
-* s3.connection.timeout: s3 connection timeout, default 1000ms
+- s3.connection.maximum: defaults to 50
+- s3.connection.request.timeout: defaults to 3000ms
+- s3.connection.timeout: defaults to 1000ms
 
-### Hive On OSS
+### Hive on OSS
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     "type"="hms",
     "hive.metastore.uris" = "thrift://172.0.0.1:9083",
@@ -166,9 +142,9 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-### Hive On OBS
+### Hive on OBS
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     "type"="hms",
     "hive.metastore.uris" = "thrift://172.0.0.1:9083",
@@ -178,9 +154,9 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-### Hive On COS
+### Hive on COS
 
-```sql
+```SQL
 CREATE CATALOG hive PROPERTIES (
     "type"="hms",
     "hive.metastore.uris" = "thrift://172.0.0.1:9083",
@@ -190,9 +166,24 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-### Hive With Glue
+### Hive with AWS Glue
+
+:::tip
+When connecting to Glue, if you are in a non-EC2 environment, you need to copy 
the `~/.aws` directory from the EC2 environment to your current environment. 
Alternatively, you can download the AWS 
[CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
 tool for configuration, which will also create the `.aws` directory in the 
current user's home directory.
+:::
+
+Access to AWS services with Doris supports two authentication methods.
 
-```sql
+**Authentication using Catalog properties**
+
+You can fill in the basic properties of credentials, such as:
+
+- When accessing S3, you can use `s3.endpoint`, `s3.access_key`, 
`s3.secret_key`.
+- When accessing Glue, you can use `glue.endpoint`, `glue.access_key`, 
`glue.secret_key`.
+
+For example, if you are accessing Glue using Iceberg Catalog, you can fill in 
the following properties to access tables hosted on Glue:
+
+```SQL
 CREATE CATALOG hive PROPERTIES (
     "type"="hms",
     "hive.metastore.type" = "glue",
@@ -202,22 +193,33 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
+**Authentication using system properties**
+
+This authentication method is used for applications running on AWS resources 
such as EC2 instances. It avoids hard-coding credentials and enhances data 
security.
+
+If you create a Catalog without filling in the Credentials properties, the 
DefaultAWSCredentialsProviderChain will be used by default. It can read 
properties configured in system environment variables or instance profiles.
+
+For configurations of environment variables and system properties, refer to 
[AWS 
CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html).
+
+- Optional environment variables include: `AWS_ACCESS_KEY_ID`, 
`AWS_SECRET_ACCESS_KEY`, `AWS_SESSION_TOKEN`, `AWS_ROLE_ARN`, 
`AWS_WEB_IDENTITY_TOKEN_FILE`, etc.
+- You can follow [AWS 
Configure](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
 to directly configure the Credentials information, which will generate a 
credentials file in the `~/.aws` directory.
+
 ## Metadata Cache & Refresh
 
-For Hive Catalog, 4 types of metadata are cached in Doris:
+For Hive Catalog, Doris caches 4 types of metadata:
 
-1. Table structure: cache table column information, etc.
-2. Partition value: Cache the partition value information of all partitions of 
a table.
-3. Partition information: Cache the information of each partition, such as 
partition data format, partition storage location, partition value, etc.
-4. File information: Cache the file information corresponding to each 
partition, such as file path location, etc.
+- Table schema: table column information, etc.
+- Partition value: partition value information of all partitions in a table.
+- Partition information: information of each partition, such as partition data 
format, partition storage location, partition value, etc.
+- File information: file information corresponding to each partition, such as 
file path location, etc.
 
-The above cache information will not be persisted to Doris, so operations such 
as restarting Doris's FE node, switching masters, etc. may cause the cache to 
become invalid. After the cache expires, Doris will directly access the Hive 
MetaStore to obtain information and refill the cache.
+The above cached information will not be persisted to Doris, so operations 
such as restarting Doris FE node, switching to masters may invalidate the 
cache. If the cache expires, Doris will directly access the Hive MetaStore to 
obtain information and refill the cache.
 
-Metadata cache can be updated automatically, manually, or configured with TTL 
(Time-to-Live) according to user needs.
+Metadata cache can be updated automatically, manually, or configured with TTL 
(Time-to-Live).
 
 ### Default behavior and TTL
 
-By default, the metadata cache expires 10 minutes after it is first accessed. 
This time is determined by the configuration parameter 
`external_cache_expire_time_minutes_after_access` in fe.conf. (Note that in 
versions 2.0.1 and earlier, the default value for this parameter was 1 day).
+By default, the metadata cache expires 10 minutes after it is first accessed. 
The TTL is determined by the configuration parameter 
`external_cache_expire_time_minutes_after_access` in fe.conf. (Note that in 
versions 2.0.1 and earlier, the default value for this parameter was 1 day).
 
 For example, if the user accesses the metadata of table A for the first time 
at 10:00, then the metadata will be cached and will automatically expire after 
10:10. If the user accesses the same metadata again at 10:11, Doris will 
directly access the Hive MetaStore to obtain information and refill the cache.
 
@@ -225,7 +227,7 @@ For example, if the user accesses the metadata of table A 
for the first time at
 
 For the `INSERT INTO OVERWRITE PARTITION` operation commonly used in Hive, you 
can also timely update the `File Information Cache` by configuring the TTL of 
the `File Information Cache`:
 
-```
+```Plain
 CREATE CATALOG hive PROPERTIES (
      'type'='hms',
      'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -241,39 +243,37 @@ You can also set this value to 0 to disable file caching, 
which will fetch file
 
 Users need to manually refresh the metadata through the 
[REFRESH](../../sql-manual/sql-reference/Utility-Statements/REFRESH.md) command.
 
-1. REFRESH CATALOG: Refresh the specified Catalog.
-
-     ```
-     REFRESH CATALOG ctl1 PROPERTIES("invalid_cache" = "true");
-     ```
+**REFRESH CATALOG: Refresh the specified Catalog.**
 
-     This command will refresh the database list, table list, and all cache 
information of the specified Catalog.
-
-     `invalid_cache` indicates whether to flush the cache. Defaults to true. 
If it is false, only the database and table list of the catalog will be 
refreshed, but the cache information will not be refreshed. This parameter is 
applicable when the user only wants to synchronize newly added or deleted 
database/table information.
+```Shell
+REFRESH CATALOG ctl1 PROPERTIES("invalid_cache" = "true");
+```
 
-2. REFRESH DATABASE: Refresh the specified Database.
+- This command refreshes the database list, table list, and all cached 
information of the specified Catalog.
+- `invalid_cache` indicates whether to refresh the cache. It defaults to true. 
If it is false, it will only refresh the database and table list of the 
catalog, but not the cached information. This parameter is applicable when the 
user only wants to synchronize newly added or deleted database/table 
information.
 
-     ```
-     REFRESH DATABASE [ctl.]db1 PROPERTIES("invalid_cache" = "true");
-     ```
+**REFRESH DATABASE: Refresh the specified Database.**
 
-     This command will refresh the table list of the specified Database and 
all cached information under the Database.
+```Shell
+REFRESH DATABASE [ctl.]db1 PROPERTIES("invalid_cache" = "true"); 
+```
 
-     The meaning of the `invalid_cache` attribute is the same as above. 
Defaults to true. If false, only the Database's table list will be refreshed, 
not cached information. This parameter is suitable for users who only want to 
synchronize newly added or deleted table information.
+- This command refreshes the table list of the specified database and all 
cached information under the database.
+- `invalid_cache` indicates the same as above. It defaults to true. If false, 
it will only refresh the table list of the database, but not the cached 
information. This parameter is applicable when the user only wants to 
synchronize newly added or deleted table information.
 
-3. REFRESH TABLE: Refresh the specified Table.
+**REFRESH TABLE: Refresh the specified Table.**
 
-     ```
-     REFRESH TABLE [ctl.][db.]tbl1;
-     ```
+```Shell
+REFRESH TABLE [ctl.][db.]tbl1; 
+```
 
-     This command will refresh all cache information under the specified Table.
+- This command refreshes all cached information under the specified table.
 
 ### Regular refresh
 
-Users can set the scheduled refresh of the Catalog when creating the Catalog.
+Users can schedule the refresh for a Catalog when creating the Catalog.
 
-```
+```Plain
 CREATE CATALOG hive PROPERTIES (
      'type'='hms',
      'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -281,41 +281,43 @@ CREATE CATALOG hive PROPERTIES (
 );
 ```
 
-In the above example, `metadata_refresh_interval_sec` means refreshing the 
Catalog every 600 seconds. Equivalent to automatically executing every 600 
seconds:
+In the above example, `metadata_refresh_interval_sec` means refreshing the 
Catalog every 600 seconds, which means the following command will be executed 
automatically every 600 seconds:
 
-`REFRESH CATALOG ctl1 PROPERTIES("invalid_cache" = "true");`
+```
+REFRESH CATALOG ctl1 PROPERTIES("invalid_cache" = "true");
+```
 
-The scheduled refresh interval must not be less than 5 seconds.
+Note that the scheduled refresh interval must not be less than 5 seconds.
 
-### Auto Refresh
+### Auto refresh
 
 Currently, Doris only supports automatic update of metadata in Hive Metastore 
(HMS). It perceives changes in metadata by the FE node which regularly reads 
the notification events from HMS. The supported events are as follows:
 
 | Event           | Corresponding Update Operation                             
  |
-| :-------------- | 
:----------------------------------------------------------- |
+| --------------- | 
------------------------------------------------------------ |
 | CREATE DATABASE | Create a database in the corresponding catalog.            
  |
 | DROP DATABASE   | Delete a database in the corresponding catalog.            
  |
-| ALTER DATABASE  | Such alterations mainly include changes in properties, 
comments, or storage location of databases. They do not affect Doris' queries 
in External Catalogs so they will not be synchronized. |
+| ALTER DATABASE  | Such alterations mainly include changes in properties, 
comments, or storage location of databases. They do not affect Doris' queries 
in External Catalogs. |
 | CREATE TABLE    | Create a table in the corresponding database.              
  |
 | DROP TABLE      | Delete a table in the corresponding database, and 
invalidate the cache of that table. |
-| ALTER TABLE     | If it is a renaming, delete the table of the old name, and 
then create a new table with the new name; otherwise, invalidate the cache of 
that table. |
+| ALTER TABLE     | If it is renaming, delete the table of the old name, and 
then create a new table with the new name; otherwise, invalidate the cache of 
that table. |
 | ADD PARTITION   | Add a partition to the cached partition list of the 
corresponding table. |
 | DROP PARTITION  | Delete a partition from the cached partition list of the 
corresponding table, and invalidate the cache of that partition. |
 | ALTER PARTITION | If it is a renaming, delete the partition of the old name, 
and then create a new partition with the new name; otherwise, invalidate the 
cache of that partition. |
 
-> After data ingestion, changes in partition tables will follow the `ALTER 
PARTITION` logic, while those in non-partition tables will follow the `ALTER 
TABLE` logic.
->
-> If changes are conducted on the file system directly instead of through the 
HMS, the HMS will not generate an event. As a result, such changes will not be 
perceived by Doris.
+After data ingestion, changes in partition tables will follow the `ALTER 
PARTITION` logic, while those in non-partition tables will follow the `ALTER 
TABLE` logic.
+
+If changes are conducted on the file system directly instead of through the 
HMS, the HMS will not generate an event. As a result, such changes will not be 
perceived by Doris.
 
 The automatic update feature involves the following parameters in fe.conf:
 
-1. `enable_hms_events_incremental_sync`: This specifies whether to enable 
automatic incremental synchronization for metadata, which is disabled by 
default. 
-2. `hms_events_polling_interval_ms`: This specifies the interval between two 
readings, which is set to 10000 by default. (Unit: millisecond) 
-3. `hms_events_batch_size_per_rpc`: This specifies the maximum number of 
events that are read at a time, which is set to 500 by default.
+- `enable_hms_events_incremental_sync`: This specifies whether to enable 
automatic incremental synchronization for metadata, which is disabled by 
default. 
+- `hms_events_polling_interval_ms`: This specifies the interval between two 
readings, which is set to 10000 by default. (Unit: millisecond) 
+- `hms_events_batch_size_per_rpc`: This specifies the maximum number of events 
that are read at a time, which is set to 500 by default.
 
 To enable automatic update(Excluding Huawei MRS), you need to modify the 
hive-site.xml of HMS and then restart HMS and HiveServer2:
 
-```
+```Plain
 <property>
     <name>hive.metastore.event.db.notification.api.auth</name>
     <value>false</value>
@@ -328,43 +330,24 @@ To enable automatic update(Excluding Huawei MRS), you 
need to modify the hive-si
     <name>hive.metastore.transactional.event.listeners</name>
     <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
 </property>
-
 ```
 
-Huawei's MRS needs to change hivemetastore-site.xml and restart HMS and 
HiveServer2:
+For Huawei MRS, you need to change hivemetastore-site.xml and restart HMS and 
HiveServer2:
 
-```
+```Plain
 <property>
     <name>metastore.transactional.event.listeners</name>
     <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
 </property>
 ```
 
-Note: Value is appended with commas separated from the original value, not 
overwritten.For example, the default configuration for MRS 3.1.0 is
-
-```
-<property>
-    <name>metastore.transactional.event.listeners</name>
-    
<value>com.huawei.bigdata.hive.listener.TableKeyFileManagerListener,org.apache.hadoop.hive.metastore.listener.FileAclListener</value>
-</property>
-```
-
-We need to change to
-
-```
-<property>
-    <name>metastore.transactional.event.listeners</name>
-    
<value>com.huawei.bigdata.hive.listener.TableKeyFileManagerListener,org.apache.hadoop.hive.metastore.listener.FileAclListener,org.apache.hive.hcatalog.listener.DbNotificationListener</value>
-</property>
-```
-
-> Note: To enable automatic update, whether for existing Catalogs or newly 
created Catalogs, all you need is to set `enable_hms_events_incremental_sync` 
to `true`, and then restart the FE node. You don't need to manually update the 
metadata before or after the restart.
-
 ## Hive Version
 
-Doris can correctly access the Hive Metastore in different Hive versions. By 
default, Doris will access the Hive Metastore with a Hive 2.3 compatible 
interface. You can also specify the hive version when creating the Catalog. If 
accessing Hive 1.1.0 version:
+Doris can correctly access the Hive Metastore in different Hive versions. By 
default, Doris will access the Hive Metastore with a Hive 2.3 compatible 
interface. If you encounter the `Invalid method name: 'get_table_req'` error 
during queries, the root cause is Hive version incompatibility.
 
-```sql 
+You can specify the Hive version when creating the Catalog. For example, to 
access Hive 1.1.0 version:
+
+```Plain
 CREATE CATALOG hive PROPERTIES (
     'type'='hms',
     'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
@@ -376,126 +359,214 @@ CREATE CATALOG hive PROPERTIES (
 
 For Hive/Iceberge/Hudi
 
-| HMS Type | Doris Type | Comment |
-|---|---|---|
-| boolean| boolean | |
-| tinyint|tinyint | |
-| smallint| smallint| |
-| int| int | |
-| bigint| bigint | |
-| date| date| |
-| timestamp| datetime| |
-| float| float| |
-| double| double| |
-| char| char | |
-| varchar| varchar| |
-| decimal| decimal | |
-| `array<type>` | `array<type>`| support nested type, for example 
`array<array<int>>` |
-| `map<KeyType, ValueType>` | `map<KeyType, ValueType>` | support nested type, 
for example `map<string, array<int>>` |
-| `struct<col1: Type1, col2: Type2, ...>` | `struct<col1: Type1, col2: Type2, 
...>` | support nested type, for example `struct<col1: array<int>, col2: 
map<int, date>>` |
-| other | unsupported | |
-
-## Whether to truncate char or varchar columns according to the schema of the 
hive table
-
-If the variable `truncate_char_or_varchar_columns` is enabled, when the 
maximum length of the char or varchar column in the schema of the hive table is 
inconsistent with the schema in the underlying parquet or orc file, it will be 
truncated according to the maximum length of the hive table column.
-
-The variable default is false.
+| HMS Type                              | Doris Type                           
 | Comment                                                      |
+| ------------------------------------- | 
------------------------------------- | 
------------------------------------------------------------ |
+| boolean                               | boolean                              
 |                                                              |
+| tinyint                               | tinyint                              
 |                                                              |
+| smallint                              | smallint                             
 |                                                              |
+| int                                   | int                                  
 |                                                              |
+| bigint                                | bigint                               
 |                                                              |
+| date                                  | date                                 
 |                                                              |
+| timestamp                             | datetime                             
 |                                                              |
+| float                                 | float                                
 |                                                              |
+| double                                | double                               
 |                                                              |
+| char                                  | char                                 
 |                                                              |
+| varchar                               | varchar                              
 |                                                              |
+| decimal                               | decimal                              
 |                                                              |
+| array<type>                           | array<type>                          
 | Supports nested structure, such as array<map<string, int>>   |
+| map<KeyType, ValueType>               | map<KeyType, ValueType>              
 | Supports nested structure, such as map<string, array<int>>   |
+| struct<col1: Type1, col2: Type2, ...> | struct<col1: Type1, col2: Type2, 
...> | Supports nested structure, such as struct<col1: array<int>, col2: 
map<int, date>> |
+| other                                 | unsupported                          
 |                                                              |
+
+:::tip
+**Truncate char or varchar columns**
+
+If `truncate_char_or_varchar_columns` is enabled, and the maximum length of 
the char or varchar column in the schema of the hive table is inconsistent with 
the schema in the underlying Parquet or ORC file, the columns will be truncated 
according to the maximum length of the hive table column.
+
+The variable defaults to false.
+:::
 
 ## Access HMS with broker
 
-Add following setting when creating an HMS catalog, file splitting and 
scanning for Hive external table will be completed by broker named `test_broker`
+Add the following setting when creating an HMS catalog, file splitting and 
scanning for Hive external table will be completed by the broker named 
`test_broker`.
 
-```sql
+```SQL
 "broker.name" = "test_broker"
 ```
 
+Doris supports querying HMS Catalog Iceberg by implementing the Iceberg FileIO 
interface in the Broker. If needed, you can add the following configuration 
when creating an HMS Catalog.
+
+```SQL
+"io-impl" = "org.apache.doris.datasource.iceberg.broker.IcebergBrokerIO"
+```
+
 ## Integrate with Apache Ranger
 
 Apache Ranger is a security framework for monitoring, enabling services, and 
comprehensive data security access management on the Hadoop platform.
 
-Currently doris supports ranger library, table, and column permissions, but 
does not support encryption, row permissions, etc.
+Doris supports authentication on specified External Hive Catalog using Apache 
Ranger.
+
+Currently Doris supports authentication of Ranger libraries, tables, and 
columns, but does not support encryption, row privileges, and Data Mask.
 
-### Settings
+If you need to use Apache Ranger for authentication of the entire Doris 
cluster, refer to [Integration with Apache 
Ranger](https://doris.apache.org/docs/dev/admin-manual/privilege-ldap/ranger/).
 
-To connect to the Hive Metastore with Ranger permission verification enabled, 
you need to add configuration & configuration environment:
+### Environment settings
+
+To connect to the Hive Metastore with Ranger authentication enabled, you need 
to add configurations and environment settings as follows:
 
 1. When creating a Catalog, add:
 
-```sql
+```SQL
 "access_controller.properties.ranger.service.name" = "hive",
 "access_controller.class" = 
"org.apache.doris.catalog.authorizer.RangerHiveAccessControllerFactory",
 ```
 
+:::note
+ `access_controller.properties.ranger.service.name` refers to the type of 
service, such as `hive`, `hdfs`, etc. It does not correspond to the value of 
`ranger.plugin.hive.service.name` in the configuration file.
+:::
+
 2. Configure all FE environments:
 
-    1. Copy the configuration files ranger-hive-audit.xml, 
ranger-hive-security.xml, and ranger-policymgr-ssl.xml under the HMS conf 
directory to the FE conf directory.
-
-    2. Modify the properties of ranger-hive-security.xml, the reference 
configuration is as follows:
-
-        ```sql
-        <?xml version="1.0" encoding="UTF-8"?>
-        <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-        <configuration>
-            #The directory for caching permission data, needs to be writable
-            <property>
-                <name>ranger.plugin.hive.policy.cache.dir</name>
-                <value>/mnt/datadisk0/zhangdong/rangerdata</value>
-            </property>
-            #The time interval for periodically pulling permission data
-            <property>
-                <name>ranger.plugin.hive.policy.pollIntervalMs</name>
-                <value>30000</value>
-            </property>
-        
-            <property>
-                
<name>ranger.plugin.hive.policy.rest.client.connection.timeoutMs</name>
-                <value>60000</value>
-            </property>
-        
-            <property>
-                
<name>ranger.plugin.hive.policy.rest.client.read.timeoutMs</name>
-                <value>60000</value>
-            </property>
-        
-            <property>
-                <name>ranger.plugin.hive.policy.rest.ssl.config.file</name>
-                <value></value>
-            </property>
-        
-            <property>
-                <name>ranger.plugin.hive.policy.rest.url</name>
-                <value>http://172.21.0.32:6080</value>
-            </property>
-        
-            <property>
-                <name>ranger.plugin.hive.policy.source.impl</name>
-                
<value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
-            </property>
-        
-            <property>
-                <name>ranger.plugin.hive.service.name</name>
-                <value>hive</value>
-            </property>
-        
-            <property>
-                <name>xasecure.hive.update.xapolicies.on.grant.revoke</name>
-                <value>true</value>
-            </property>
-        
-        </configuration>
-        ```
-
-    3. In order to obtain the log of Ranger authentication itself, add the 
configuration file log4j.properties in the `<doris_home>/conf` directory.
-
-    4. Restart FE.
-
-### Best Practices
-
-1. Create user user1 on the ranger side and authorize the query permission of 
db1.table1.col1
-
-2. Create role role1 on the ranger side and authorize the query permission of 
db1.table1.col2
-
-3. Create a user user1 with the same name in doris, user1 will directly have 
the query authority of db1.table1.col1
-
-4. Create role1 with the same name in doris, and assign role1 to user1, user1 
will have the query authority of db1.table1.col1 and col2 at the same time
+a. Copy the configuration files ranger-hive-audit.xml, 
ranger-hive-security.xml, and ranger-policymgr-ssl.xml under the HMS conf 
directory to the FE conf directory.
+
+b. Modify the properties in ranger-hive-security.xml, an example is as 
follows: 
+
+```SQL
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+    #The directory for caching permission data, needs to be writable
+    <property>
+        <name>ranger.plugin.hive.policy.cache.dir</name>
+        <value>/mnt/datadisk0/zhangdong/rangerdata</value>
+    </property>
+    #The time interval for periodically pulling permission data
+    <property>
+        <name>ranger.plugin.hive.policy.pollIntervalMs</name>
+        <value>30000</value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.policy.rest.client.connection.timeoutMs</name>
+        <value>60000</value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.policy.rest.client.read.timeoutMs</name>
+        <value>60000</value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.policy.rest.ssl.config.file</name>
+        <value></value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.policy.rest.url</name>
+        <value>http://172.21.0.32:6080</value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.policy.source.impl</name>
+        <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>
+    </property>
+
+    <property>
+        <name>ranger.plugin.hive.service.name</name>
+        <value>hive</value>
+    </property>
+
+    <property>
+        <name>xasecure.hive.update.xapolicies.on.grant.revoke</name>
+        <value>true</value>
+    </property>
+
+</configuration>
+```
+
+c. In order to obtain the logs of Ranger authentication, add the configuration 
file log4j.properties in the `<doris_home>/conf` directory.
+
+d. Restart FE.
+
+### Best Practice
+
+1. Create `user1` on the Ranger side and authorize the query permission of 
db1.table1.col1
+2. Create `role1` on the Ranger side and authorize the query permission of 
db1.table1.col2
+3. Create the same `user1` in Doris, `user1` will be directly granted with the 
query permission of db1.table1.col1
+4. Create the same `role1` in Doris, and assign `role1` to `user1`. `user1` 
will be directly granted with the query permission of db1.table1.col1 and col2 
at the same time.
+5. The Admin and Root privileges of Doris are not impacted by Apache Ranger.
+
+## Authentication with Kerberos
+
+Kerberos is an authentication protocol designed to provide strong identity 
verification for applications through the use of encryption with private keys.
+
+### Environment settings
+
+1. When the services in the cluster are configured with Kerberos 
authentication, the relevant authentication information is needed in Hive 
Catalog configuration.
+
+- `hadoop.kerberos.keytab`: the principal required for authentication. It 
should be the same as the keytab in the Doris cluster.
+- `hadoop.kerberos.principal`: Find the corresponding principal for the 
hostname in the Doris cluster, such as `doris/``[email protected]`, and 
verify with `klist -kt` using the keytab.
+- `yarn.resourcemanager.principal`: Go to the Yarn Resource Manager node and 
retrieve this from `yarn-site.xml`. Verify the keytab of Yarn with `klist -kt`.
+- `hive.metastore.kerberos.principal`: Go to the Hive metadata service node 
and retrieve this from `hive-site.xml`. Verify the keytab of Hive with `klist 
-kt`.
+- `hadoop.security.authentication`: Enable Hadoop Kerberos authentication.
+
+Place the `krb5.conf` file and `keytab` authentication file on all BE and FE 
nodes. Ensure consistencyt of the path of the `keytab` authentication file. By 
default, the `krb5.conf` file is placed in the `/etc/krb5.conf` path. Also, 
confirm that the JVM parameter `-Djava.security.krb5.conf` and the environment 
variable `KRB5_CONFIG`point to the correct path of the `krb5.conf` file.
+
+2. After the configuration is completed, if you cannot locate the issue in the 
FE and BE logs, you can enable Kerberos debugging.
+
+- On all FE and BE nodes, locate the `conf/fe.conf` and `conf/be.conf` files 
in the deployment path.
+- After finding the configuration files, set the JVM parameter 
`-Dsun.security.krb5.debug=true` in the `JAVA_OPTS` variable to enable Kerberos 
debugging.
+- The FE Kerberos authentication debug information can be found in the FE log 
path `log/fe.out`, and the BE Kerberos authentication debug information can be 
found in the BE log path `log/be.out`.
+- For solutions to more common issues, refer to 
[FAQ](https://doris.apache.org/docs/dev/lakehouse/faq/).
+
+### Best practice
+
+Example:
+
+```SQL
+CREATE CATALOG hive_krb PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
+    'hive.metastore.sasl.enabled' = 'true',
+    'hive.metastore.kerberos.principal' = 'your-hms-principal',
+    'hadoop.security.authentication' = 'kerberos',
+    'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',   
+    'hadoop.kerberos.principal' = '[email protected]',
+    'yarn.resourcemanager.principal' = 'your-rm-principal'
+);
+```
+
+Example of HDFS HA information and Kerberos authentication information:
+
+```SQL
+CREATE CATALOG hive_krb_ha PROPERTIES (
+    'type'='hms',
+    'hive.metastore.uris' = 'thrift://172.0.0.1:9083',
+    'hive.metastore.sasl.enabled' = 'true',
+    'hive.metastore.kerberos.principal' = 'your-hms-principal',
+    'hadoop.security.authentication' = 'kerberos',
+    'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',   
+    'hadoop.kerberos.principal' = '[email protected]',
+    'yarn.resourcemanager.principal' = 'your-rm-principal',
+    'dfs.nameservices'='your-nameservice',
+    'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
+    'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:8088',
+    'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:8088',
+    
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
+);
+```
+
+## Hive transactional table
+
+Hive transactional tables are tables that support ACID semantics in Hive. For 
more details, see: 
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
+
+### Support for Hive transactional tables
+
+| Table Type                      | Supported Operations in Hive               
| Hive Table Properties                                        | Supported Hive 
Versions                                      |
+| ------------------------------- | ------------------------------------------ 
| ------------------------------------------------------------ | 
------------------------------------------------------------ |
+| Full-ACID Transactional Table   | Supports Insert, Update, Delete operations 
| 'transactional'='true', 'transactional_properties'='insert_only' | 3.x, 2.x. 
In 2.x, loading can only be performed after major compaction is executed in 
Hive. |
+| Insert-Only Transactional Table | Only supports Insert operations            
| 'transactional'='true'                                       | 3.x，2.x        
                                             |
+
+### Current restrictions
 
+Currently, scenarios involving Original Files are not supported. After a table 
is converted into a transactional table, subsequent newly written data files 
will follow the schema of Hive Transactional table. However, existing data 
files will not be converted to the schema of the Transactional table. These 
files are referred to as Original Files.
diff --git a/versioned_docs/version-2.0/table-design/best-practice.md 
b/versioned_docs/version-2.0/table-design/best-practice.md
index 5c4333bd7b8c..216ffcdcffd7 100644
--- a/versioned_docs/version-2.0/table-design/best-practice.md
+++ b/versioned_docs/version-2.0/table-design/best-practice.md
@@ -27,156 +27,451 @@ under the License.
 
 # Best Practices
 
-## 1 tabulation
+## Data models
 
-### 1.1 Data Model Selection
+> Doris arranges data in three models: DUPLICATE KEY model, UNIQUE KEY model, 
and AGGREGATE KEY model.
 
-Doris data model is currently divided into three categories: AGGREGATE KEY, 
UNIQUE KEY, DUPLICATE KEY. Data in all three models are sorted by KEY.
+:::tip
+**Recommendations**
 
-1.1.1. AGGREGATE KEY
+Because the data model is determined and **immutable** during table creation, 
it is important to select the most suitable data model.
 
-When AGGREGATE KEY is the same, old and new records are aggregated. The 
aggregation functions currently supported are SUM, MIN, MAX, REPLACE.
+1. The Duplicate Key model is a good fit for ad-hoc queries across any 
dimensions. It cannot leverage the benefits of pre-aggregation, but it is also 
not constrained by the limitations of an aggregation model, so it can take 
advantage of columnar storage (only reading the relevant columns without the 
need to read all key columns).
+2. The Aggregate Key model, through pre-aggregation, can largely reduce the 
amount of data scanned and the computational workload for aggregate queries. It 
is particularly suitable for reporting queries with fixed patterns. However, 
this model is not friendly for count(*) queries. Additionally, since the 
aggregation method on the Value columns are fixed, users should pay extra 
attention to semantic correctness when performing other types of aggregate 
queries.
+3. The Unique Key model is designed for scenarios that require unique primary 
key constraints. It can ensure the uniqueness of primary keys. The downside is 
that it cannot reap the benefits brought by materialization and 
pre-aggregation. For users with high-performance requirements for aggregate 
queries, it is recommended to use the Merge-on-Write feature of the Unique Key 
model since Doris 1.2.
+4. Users with partial column update requirements might select from the 
following data models:
+   1. Unique Key model (Merge-on-Write mode)
+   2. Aggregate Key model (aggregated by REPLACE_IF_NOT_NULL)
+:::
 
-AGGREGATE KEY model can aggregate data in advance and is suitable for 
reporting and multi-dimensional analysis business.
+### DUPLICATE KEY model
 
-```
-CREATE TABLE site_visit
+![duplicate-key-model-example](../../../static/images/duplicate-key-model-example.png)
+
+When only the sorting columns are specified, rows with the same key will not 
be merged.
+
+This is applicable to analytical business scenarios where data does not 
require pre-aggregation:
+
+- Analysis of raw data
+- Analysis of log or time series data where only new data is appended.
+
+**Best practice**
+
+```SQL
+-- For example, log analysis that allows only appending new data with 
replicated KEYs.
+CREATE TABLE session_data
 (
-siteid      INT,
-City: SMALLINT,
-username VARCHAR (32),
-pv BIGINT   SUM DEFAULT '0'
+    visitorid   SMALLINT,
+    sessionid   BIGINT,
+    visittime   DATETIME,
+    city        CHAR(20),
+    province    CHAR(20),
+    ip          varchar(32),
+    brower      CHAR(20),
+    url         VARCHAR(1024)
 )
-AGGREGATE KEY(siteid, city, username)
-DISTRIBUTED BY HASH(siteid) BUCKETS 10;
+DUPLICATE KEY(visitorid, sessionid) -- Used solely for specifying sorting 
columns, rows with the same KEY will not be merged.
+DISTRIBUTED BY HASH(sessionid, visitorid) BUCKETS 10;
 ```
 
-1.1.2. UNIQUE KEY
+### AGGREGATE KEY model
 
-When UNIQUE KEY is the same, the new record covers the old record. Before 
version 1.2, UNIQUE KEY implements the same REPLACE aggregation method as 
AGGREGATE KEY, and they are essentially the same. We introduced a new 
merge-on-write implementation for UNIQUE KEY since version 1.2, which have 
better performance on many scenarios. Suitable for analytical business with 
updated requirements.
+![aggregate-key-model-example](../../../static/images/aggregate-key-model-example.png)
 
-```
-CREATE TABLE sales_order
+Old and new records with the same AGGREGATE KEY will be aggregated. The 
currently supported aggregation methods are as follows:
+
+1. SUM: calculates the sum by accumulating the values of multiple rows;
+2. REPLACE: replaces the value in the previously imported rows with the value 
from the next batch of data;
+3. MAX: retains the maximum value;
+4. MIN: retains the minimum value;
+5. REPLACE_IF_NOT_NULL: replaces non-null values. Unlike REPLACE, it does not 
replace null values;
+6. HLL_UNION: aggregates columns of HLL type using the HyperLogLog algorithm;
+7. BITMAP_UNION: aggregates columns of BITMAP type using bitmap union 
aggregation;
+
+This is suitable for reporting and multi-dimensional analysis scenarios such 
as:
+
+- Website traffic analysis
+- Multi-dimensional analysis of data reports.
+
+**Best practice**
+
+```SQL
+-- Example of website traffic analysis
+CREATE TABLE site_visit
 (
-orderid     BIGINT,
-status      TINYINT,
-username VARCHAR (32),
-amount      BIGINT DEFAULT '0'
+    siteid      INT,
+    city        SMALLINT,
+    username    VARCHAR(32),
+    pv BIGINT   SUM DEFAULT '0' -- PV caculation
 )
-KEY (orderid) UNIT
-DISTRIBUTED BY HASH(orderid) BUCKETS 10;
+AGGREGATE KEY(siteid, city, username) -- Rows with the same KEY will be 
merged, and non-key columns will be aggregated based on the specified 
aggregation function.
+DISTRIBUTED BY HASH(siteid) BUCKETS 10;
 ```
 
-1.1.3. DUPLICATE KEY
+### UNIQUE KEY model
 
-Only sort columns are specified, and the same rows are not merged. It is 
suitable for the analysis business where data need not be aggregated in advance.
+The new record will replace the old record that has the same UNIQUE KEY with 
it. Before Doris 1.2, the UNIQUE KEY model was implemented the same way as the 
REPLACE aggregation of the AGGREGATE KEY model. However, Since Doris 1.2, we 
have introduced the Merge-on-Write implementation for the UNIQUE KEY model, 
which provides better performance for aggregate queries.
 
-```
-CREATE TABLE session_data
+This is suitable for analytical business scenarios that require updates, such 
as:
+
+- Deduplicated order analysis
+- Real-time synchronization of inserts, updates, and deletes.
+
+**Best practice**
+
+```SQL
+-- Example of deduplicated order analysis
+CREATE TABLE sales_order
 (
-visitorid SMALLINT,
-sessionid   BIGINT,
-visit time DATETIME,
-City CHAR (20),
-province    CHAR(20),
-ip. varchar (32),
-brower      CHAR(20),
-url: VARCHAR (1024)
+    orderid     BIGINT,
+    status      TINYINT,
+    username    VARCHAR(32),
+    amount      BIGINT DEFAULT '0'
 )
-DUPLICATE KEY (visitor time, session time)
-DISTRIBUTED BY HASH(sessionid, visitorid) BUCKETS 10;
+UNIQUE KEY(orderid) -- Rows of the same KEY will be merged
+DISTRIBUTED BY HASH(orderid) BUCKETS 10;
 ```
 
-### 1.2 Wide Table vs. Star Schema
-
-When the business side builds tables, in order to adapt to the front-end 
business, they often do not distinguish between dimension information and 
indicator information, and define the Schema as a large wide table, this 
operation is actually not so friendly to the database, we recommend users to 
use the star model.
+## Index
 
-* There are many fields in Schema, and there may be more key columns in the 
aggregation model. The number of columns that need to be sorted in the import 
process will increase.
-* Dimensional information updates are reflected in the whole table, and the 
frequency of updates directly affects the efficiency of queries.
+> Indexes can facilitate quick filtering and searching of data. Currently, 
Doris supports two types of indexes:
+>
+> 1. Built-in smart indexes, including prefix index and ZoneMap index.
+> 2. User-created secondary indexes, including inverted index, BloomFilter 
index, Ngram BloomFilter index, and Bitmap index.
 
-In the process of using Star Schema, users are advised to use Star Schema to 
distinguish dimension tables from indicator tables as much as possible. 
Frequently updated dimension tables can also be placed in MySQL external 
tables. If there are only a few updates, they can be placed directly in Doris. 
When storing dimension tables in Doris, more copies of dimension tables can be 
set up to improve Join's performance.
+### **Prefix index**
 
-### 1.3 Partitioning and Bucketing 
+Prefix indexes are built-in indexes in the Aggregate, Unique, and Duplicate 
data models. The underlying data storage is sorted and stored based on the 
columns specified as the AGGREGATE KEY, UNIQUE KEY, or DUPLICATE KEY in their 
respective table creation statements. Prefix index, built on top of the sorted 
data, allows for quick data querying based on given prefix columns.
 
-Doris supports two-level partitioned storage. The first level is partition, 
which currently supports both RANGE and LIST partition types, and the second 
layer is HASH bucket.
+Prefix indexes are sparse indexes and cannot locate the exact row where a key 
is present. Instead, they can only roughly identify the range where the key may 
exist, and then use binary search algorithms to accurately locate the position 
of the key.
 
-1.3.1. Partitioning
+:::tip
+**Recommendations**
 
-Partition is used to divide data into different intervals, which can be 
logically understood as dividing the original table into multiple sub-tables. 
Data can be easily managed by partition, for example, to delete data more 
quickly.
+1. When creating a table, **the correct column order can greatly improve query 
efficiency**.
+   1.  Since the column order is specified during table creation, a table can 
only have one type of prefix index. However, this may not be efficient enough 
for queries based on the columns without prefix index. In such cases, users can 
adjust the column order by creating materialized views.
+2. The first field in a prefix index should always be the field of the longest 
query conditions and should be a high-cardinality field.
+   1. Bucketing field: It should have relatively even data distribution and be 
frequently used, preferably a high-cardinality field.
+   2. Int(4) + Int(4) + varchar(50): The prefix index length is only 28.
+   3. Int(4) + varchar(50) + Int(4): The prefix index length is only 24.
+   4. varchar(10) + varchar(50): The prefix index length is only 30.
+   5. Prefix index (36 characters): The first field delivers the best query 
performance. If a varchar field is encountered, the prefix index will 
automatically truncate it to the first 20 characters.
+   6. If possible, include the most frequently used query fields in the prefix 
index. Otherwise, specify them as the bucketing fields.
+3. The field lengths in the prefix index should be as explicit as possible 
because Doris can only utilize the prefix index for the first 36 bytes.
+4. If it is difficult to design a partitioning, bucketing, and prefix index 
strategy for your data range, consider introducing inverted indexes for 
acceleration.
+:::
 
-1.3.1.1. Range Partitioning
+### **ZoneMap index**
 
-In business, most users will choose to partition on time, which has the 
following advantages:
+ZoneMap index is index information automatically maintained on a per-column 
basis in the columnar storage format. It includes information such as Min/Max 
values and the number of Null values. During data querying, the ZoneMap index 
is utilized to select the data range to scan based on the filtered fields using 
range conditions.
 
-* Differentiable heat and cold data
-* Availability of Doris Hierarchical Storage (SSD + SATA)
+For example, when filtering the "age" field with the following query statement:
 
-1.3.1.2. List Partitioning
+```Shell
+SELECT * FROM table WHERE age > 0 and age < 51;
+```
 
-In business,, users can select cities or other enumeration values for 
partition.
+If the Short Key Index is not hit, the ZoneMap index will be used to determine 
the data range, known as the "ordinary" range, that needs to be scanned based 
on the query conditions for the "age" field. This reduces the number of pages 
that need to be scanned.
 
-1.3.2. Hash Bucketing
+### **Inverted index**
 
-The data is divided into different buckets according to the hash value.
+Doris supports inverted indexes since version 2.0.0. Inverted index can be 
used for full-text searches on text data and range queries on regular numeric 
and date types. It enables fast filtering of rows that meet the conditions from 
massive amounts of data.
 
-* It is suggested that columns with large differentiation should be used as 
buckets to avoid data skew.
-* In order to facilitate data recovery, it is suggested that the size of a 
single bucket should not be too large and should be kept within 10GB. 
Therefore, the number of buckets should be considered reasonably when building 
tables or increasing partitions, among which different partitions can specify 
different buckets.
+**Best practice**
 
-### 1.4 Sparse Index and Bloom Filter
+```SQL
+-- Inverted index can be specified during table creation or added later. This 
is an example of specifying it during table creation:
+CREATE TABLE table_name
+(
+  columns_difinition,
+  INDEX idx_name1(column_name1) USING INVERTED [PROPERTIES("parser" = 
"english|unicode|chinese")] [COMMENT 'your comment']
+  INDEX idx_name2(column_name2) USING INVERTED [PROPERTIES("parser" = 
"english|unicode|chinese")] [COMMENT 'your comment']
+  INDEX idx_name3(column_name3) USING INVERTED [PROPERTIES("parser" = 
"chinese", "parser_mode" = "fine_grained|coarse_grained")] [COMMENT 'your 
comment']
+  INDEX idx_name4(column_name4) USING INVERTED [PROPERTIES("parser" = 
"english|unicode|chinese", "support_phrase" = "true|false")] [COMMENT 'your 
comment']
+  INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type" 
= "char_replace", "char_filter_pattern" = "._"), "char_filter_replacement" = " 
"] [COMMENT 'your comment']
+  INDEX idx_name5(column_name4) USING INVERTED [PROPERTIES("char_filter_type" 
= "char_replace", "char_filter_pattern" = "._")] [COMMENT 'your comment']
+)
+table_properties;
 
-Doris stores the data in an orderly manner, and builds a sparse index for 
Doris on the basis of ordered data. The index granularity is block (1024 rows).
+-- Example: keyword matching in full-text searches, implemented by MATCH_ANY 
MATCH_ALL
+SELECT * FROM table_name WHERE column_name MATCH_ANY | MATCH_ALL 'keyword1 
...';
+```
 
-Sparse index chooses fixed length prefix in schema as index content, and Doris 
currently chooses 36 bytes prefix as index.
+:::tip
+**Recommendations**
 
-* When building tables, it is suggested that the common filter fields in 
queries should be placed in front of Schema. The more distinguishable the query 
fields are, the more frequent the query fields are.
-* One particular feature of this is the varchar type field. The varchar type 
field can only be used as the last field of the sparse index. The index is 
truncated at varchar, so if varchar appears in front, the length of the index 
may be less than 36 bytes. Specifically, you can refer to [data 
model](./data-model.md), [ROLLUP and query](./hit-the-rollup.md).
-* In addition to sparse index, Doris also provides bloomfilter index. 
Bloomfilter index has obvious filtering effect on columns with high 
discrimination. If you consider that varchar cannot be placed in a sparse 
index, you can create a bloomfilter index.
+1. If it is difficult to design a partitioning, bucketing, and prefix index 
strategy for your data range, consider introducing inverted indexes for 
acceleration.
+:::
 
-### 1.5 Rollup
 
-Rollup can essentially be understood as a physical index of the original 
table. When creating Rollup, only some columns in Base Table can be selected as 
Schema. The order of fields in Schema can also be different from that in Base 
Table.
+:::caution
+**Restrictions**
 
-Rollup can be considered in the following cases:
+1. Different data models have different restrictions on inverted index.
+   1. Aggregate KEY model: only allows inverted index for Key columns
+   2. Unique KEY model: allows inverted index for any column after enabling 
Merge-on-Write
+   3. Duplicate KEY model: allows inverted index for any column
+:::
 
-1.5.1. Low ratio of data aggregation in the Base Table
+### **BloomFilter index**
 
-This is usually due to the fact that Base Table has more differentiated 
fields. At this point, you can consider selecting some columns and establishing 
Rollup.
+Doris supports adding BloomFilter indexes to fields with high value 
distinctiveness, making it suitable for scenarios that involve equivalence 
queries on columns with high cardinality.
 
-For the `site_visit'table:
+**Best practice**
 
+```SQL
+-- Example: add "bloom_filter_columns"="k1,k2,k3" in the PROPERTIES of the 
table creation statement.
+-- To create BloomFilter index for saler_id and category_id in the table.
+CREATE TABLE IF NOT EXISTS sale_detail_bloom  (
+    sale_date date NOT NULL COMMENT "Sale data",
+    customer_id int NOT NULL COMMENT "Customer ID",
+    saler_id int NOT NULL COMMENT "Saler ID",
+    sku_id int NOT NULL COMMENT "SKU ID",
+    category_id int NOT NULL COMMENT "Category ID",
+    sale_count int NOT NULL COMMENT "Sale count",
+    sale_price DECIMAL(12,2) NOT NULL COMMENT "Sale price",
+    sale_amt DECIMAL(20,2)  COMMENT "Sale amount"
+)
+Duplicate  KEY(sale_date, customer_id,saler_id,sku_id,category_id)
+DISTRIBUTED BY HASH(saler_id) BUCKETS 10
+PROPERTIES (
+"bloom_filter_columns"="saler_id,category_id"
+);
 ```
-site -u visit (siteid, city, username, pv)
-```
-
-Siteid may lead to a low degree of data aggregation. If business parties often 
base their PV needs on city statistics, they can build a city-only, PV-based 
rollup:
 
-```
-ALTER TABLE site_visit ADD ROLLUP rollup_city(city, pv);
+:::caution
+**Restrictions**
+
+1. BloomFilter indexes are not supported for columns of type Tinyint, Float, 
and Double.
+2. BloomFilter indexes can only accelerate filtering using "in" and "=" 
operators.
+3. BloomFilter indexes must be built on high-cardinality columns (above 5000) 
in query conditions that involve "in" or "=" operators.
+   1. BloomFilter indexes are suitable for non-prefix filtering.
+   2. Queries will filter based on the high-frequency values in the column, 
and the filtering conditions are mostly "in" and "=".
+   3. Unlike Bitmap indexes, BloomFilter indexes are suitable for 
high-cardinality columns, such as UserID. If created on low-cardinality columns 
like "gender", each block will contain almost all values, rendering the 
BloomFilter index meaningless.
+   4. It is suitable for cases with data cardinality around half of the total 
range.
+   5. For high-cardinality columns with equality (=) queries, such as ID 
numbers, using Bitmap indexes can greatly accelerate performance.
+:::
+
+### **Ngram BloomFilter index**
+
+Since 2.0.0, Doris has introduced the NGram BloomFilter index to improve the 
performance of "**LIKE**" queries.
+
+**Best practice**
+
+```SQL
+-- Example of creating NGram BloomFilter index in table creation statement
+CREATE TABLE `nb_table` (
+  `siteid` int(11) NULL DEFAULT "10" COMMENT "",
+  `citycode` smallint(6) NULL COMMENT "",
+  `username` varchar(32) NULL DEFAULT "" COMMENT "",
+  INDEX idx_ngrambf (`username`) USING NGRAM_BF PROPERTIES("gram_size"="3", 
"bf_size"="256") COMMENT 'username ngram_bf index'
+) ENGINE=OLAP
+AGGREGATE KEY(`siteid`, `citycode`, `username`) COMMENT "OLAP"
+DISTRIBUTED BY HASH(`siteid`) BUCKETS 10;
+
+-- PROPERTIES("gram_size"="3", "bf_size"="256"), representing the number of 
grams and the byte size of the BloomFilter
+-- The number of grams is determined according to the query cases and is 
typically set to the length of the majority of query strings. The number of 
bytes in the BloomFilter can be determined after testing. Generally, a larger 
number of bytes leads to better filtering results, and it is recommended to 
start with a value of 256 for testing and evaluating the effectiveness. 
However, it's important to note that a larger number of bytes also increases 
the storage cost of the index.
+-- With high data cardinality, there is no need to set a large BloomFilter 
size. Conversely, with low data cardinality, increase the BloomFilter size to 
enhance filtering efficiency.
 ```
 
-1.5.2. The prefix index in Base Table cannot be hit
+:::caution
+**Restrictions**
 
-Generally, the way Base Table is constructed cannot cover all query modes. At 
this point, you can consider adjusting the column order and establishing Rollup.
+1. NGram BloomFilter index only supports string columns.
+2. NGram BloomFilter indexes and BloomFilter indexes are mutually exclusive, 
meaning that only one of them can be set for the same column.
+3. The sizes of the NGram and the BloomFilter can both be optimized based on 
the actual situation. If the NGram size is relatively small, you may increase 
the BloomFilter size.
+4. For data at the scale of billions or above, if there is a need for fuzzy 
matching, it is recommended to use inverted indexes or NGram BloomFilter.
+:::
 
-Database Session
+### **Bitmap index**
 
-```
-session -u data (visitorid, sessionid, visittime, city, province, ip, browser, 
url)
-```
+To accelerate data queries, Doris supports users in adding Bitmap indexes to 
certain fields. This is suitable for scenarios involving equivalence or range 
queries on columns with lower cardinality.
 
-In addition to visitorid analysis, there are browser and province analysis 
cases, Rollup can be established separately.
+**Best practice**
 
+```SQL
+-- Example: create Bitmap index for siteid on bitmap_table
+CREATE INDEX [IF NOT EXISTS] bitmap_index_name ON
+bitmap_table (siteid)
+USING BITMAP COMMENT 'bitmap_siteid';
 ```
-ALTER TABLE session_data ADD ROLLUP rollup_browser(browser,province,ip,url) 
DUPLICATE KEY(browser,province);
-```
-
-## Schema Change
-
-Users can modify the Schema of an existing table through the Schema Change 
operation, currently Doris supports the following modifications:
 
-- Adding and deleting columns
-- Modify column types
-- Reorder columns
-- Adding or removing index
+:::caution
+**Restrictions**
+
+1. Bitmap indexes can only be created on a single column.
+2. Bitmap indexes can be applied to all columns in the `Duplicate` and 
`Unique` Key models, as well as the key columns in the `Aggregate` Key model.
+3. Bitmap indexes support the following data types:
+   1. `TINYINT`
+   2. `SMALLINT`
+   3. `INT`
+   4. `BIGINT`
+   5. `CHAR`
+   6. `VARCHAR`
+   7. `DATE`
+   8. `DATETIME`
+   9. `LARGEINT`
+   10. `DECIMAL`
+   11. `BOOL`
+4. Bitmap indexes only take effect under Segment V2. The storage format of 
tables with Bitmap indexes will be automatically converted to V2 format by 
default.
+5. Bitmap indexes should be constructed within a certain cardinality range. It 
is not suitable for extremely high or low cardinality cases.
+   1. It is recommended for columns with a cardinality between 100 and 
100,000, such as the occupation field or city field. If the duplication rate is 
too high, there won't be significant advantages to build Bitmap indexes 
compared to other types of indexes. If the duplication rate is too low, Bitmap 
indexes can significantly reduce space efficiency and performance. Specific 
types of queries, such as count, OR, and AND operations, only require bitwise 
operations.
+   2. Bitmap indexes are more suitable for orthogonal queries.
+:::
+
+## Field type
+
+Doris supports various field types, including precise deduplication with 
BITMAP, fuzzy deduplication with HLL, semi-structured data types such as 
ARRAY/MAP/JSON, as well as common numeric, string, and time types.
+
+:::tip
+**Recommendations**
+
+1. VARCHAR 
+   1. Variable-length string with a length range of 1-65533 bytes. It is 
stored in UTF-8 encoding, where English characters typically occupy 1 byte.
+   2. There is often a misunderstanding about the performance difference 
between varchar(255) and varchar(65533). If the data stored in both cases is 
the same, the performance will be the same as well. When creating a table, if 
you are unsure about the maximum length of the field, it is recommended to use 
varchar(65533) to prevent import errors caused by excessively long strings.
+2. STRING 
+   1. Variable-length string with a default size of 1048576 bytes (1MB), which 
can be increased to 2147483643 bytes (2GB). It is stored in UTF-8 encoding, 
where English characters typically occupy 1 byte.
+   2. It can only be used in value columns but not key columns or partitioning 
columns.
+   3. It is suitable for storing large text content. However, if such a 
requirement does not exist, it is recommended to use VARCHAR. STRING columns 
have limitations as they cannot be used in key columns or partitioning columns.
+3. Numeric fields: Choose the appropriate data type based on the required 
precision. There is no special restrictions on this.
+4. Time fields: Note that if there is a high precision requirement (timestamp 
accurate to milliseconds), you need to specify the use of datetime(6). 
Otherwise, such timestamps are not supported by default.
+5. It is recommended to use the JSON data type instead of string type for 
storing JSON data.
+:::
+
+## Create table
+
+![create-table-example](../../../static/images/create-table-example.png)
+
+Considerations in creating a table include the setting of data partitions and 
buckets in addition to data model, index, and field types.
+
+**Best practice**
+
+```SQL
+-- Take Merge-on-Write tables in the Unique Key model as an example:
+-- Merge-on-Write in the Unique Key model is implemented in a different way 
from the Aggregate Key model. The performance of it is similar to that on the 
Duplicate Key model.
+-- In use cases requiring primary key constraints, the Aggregate Key model can 
deliver much better query performance compared to the Duplicate Key model, 
especially in aggregate queries and queries that involve filtering a large 
amount of data using indexes.
+
+-- For non-partitioned tables
+CREATE TABLE IF NOT EXISTS tbl_unique_merge_on_write
+(
+    `user_id` LARGEINT NOT NULL COMMENT "Use ID",
+    `username` VARCHAR(50) NOT NULL COMMENT "Username",
+    `register_time` DATE COMMENT "User registration time",
+    `city` VARCHAR(20) COMMENT "User city",
+    `age` SMALLINT COMMENT "User age",
+    `sex` TINYINT COMMENT "User gender",
+    `phone` LARGEINT COMMENT "User phone number",
+    `address` VARCHAR(500) COMMENT "User address"
+)
+UNIQUE KEY(`user_id`, `username`)
+-- Data volume of 3~5G
+DISTRIBUTED BY HASH(`user_id`) BUCKETS 10 
+PROPERTIES (
+-- In Doris 1.2.0, as a new feature, Merge-on-Write is disabled by default. 
Users can enable it by adding the following property.
+"enable_unique_key_merge_on_write" = "true" 
+);
+
+-- For partitioned tables
+CREATE TABLE IF NOT EXISTS tbl_unique_merge_on_write_p
+(
+    `user_id` LARGEINT NOT NULL COMMENT "Use ID",
+    `username` VARCHAR(50) NOT NULL COMMENT "Username",
+    `register_time` DATE COMMENT "User registration time",
+    `city` VARCHAR(20) COMMENT "User city",
+    `age` SMALLINT COMMENT "User age",
+    `sex` TINYINT COMMENT "User gender",
+    `phone` LARGEINT COMMENT "User phone number",
+    `address` VARCHAR(500) COMMENT "User address"
+)
+UNIQUE KEY(`user_id`, `username`, `register_time`)
+PARTITION BY RANGE(`register_time`) (
+    PARTITION p00010101_1899 VALUES [('0001-01-01'), ('1900-01-01')), 
+    PARTITION p19000101 VALUES [('1900-01-01'), ('1900-01-02')), 
+    PARTITION p19000102 VALUES [('1900-01-02'), ('1900-01-03')),
+    PARTITION p19000103 VALUES [('1900-01-03'), ('1900-01-04')),
+    PARTITION p19000104_1999 VALUES [('1900-01-04'), ('2000-01-01')),
+    FROM ("2000-01-01") TO ("2022-01-01") INTERVAL 1 YEAR, 
+    PARTITION p30001231 VALUES [('3000-12-31'), ('3001-01-01')), 
+    PARTITION p99991231 VALUES [('9999-12-31'), (MAXVALUE)) 
+) 
+-- Data volume of 3~5G
+DISTRIBUTED BY HASH(`user_id`) BUCKETS 10 
+PROPERTIES ( 
+-- In Doris 1.2.0, as a new feature, Merge-on-Write is disabled by default. 
Users can enable it by adding the following property.
+"enable_unique_key_merge_on_write" = "true", 
+-- The unit for dynamic partition scheduling can be specified as HOUR, DAY, 
WEEK, MONTH, or YEAR.
+"dynamic_partition.time_unit" = "MONTH",
+-- The starting offset for dynamic partitioning is specified as a negative 
number. Depending on the value of the time_unit, it uses the current day 
(week/month) as the reference point, partitions prior to this offset will be 
deleted (TTL). If not specified, the default value is -2147483648, indicating 
that historical partitions will not be deleted.
+"dynamic_partition.start" = "-3000",
+-- The ending offset for dynamic partitioning is specified as a positive 
number. Depending on the value of the time_unit, it uses the current day 
(week/month) as the reference point. Create the corresponding partitions of the 
specified range in advance.
+"dynamic_partition.end" = "10",
+-- The prefix for names of the dynamically created partitions (required).
+"dynamic_partition.prefix" = "p",
+-- The number of buckets corresponding to the dynamically created partitions.
+"dynamic_partition.buckets" = "10", 
+"dynamic_partition.enable" = "true", 
+-- The following is the number of replicas corresponding to dynamically 
created partitions. If not specified, the default value will be the replication 
factor specified when creating the table, which is typically 3.
+"dynamic_partition.replication_num" = "3",
+"replication_num" = "3"
+);  
+
+-- View existing partitions
+-- The actual number of created partitions is determined by a combination of 
dynamic_partition.start, dynamic_partition.end, and the settings of PARTITION 
BY RANGE.
+show partitions from tbl_unique_merge_on_write_p;
+```
 
-For details, please refer to [Schema 
Change](../advanced/alter-table/schema-change.md)
\ No newline at end of file
+:::tip
+**Recommendations**
+
+1. The database name should be in lowercase, separated by underscores (_) with 
a maximum length of 62 bytes.
+2. Table names are case-sensitive and should be in lowercase, separated by 
underscores (_) with a maximum length of 64 bytes.
+3. Manual bucketing is preferred over auto bucketing. It is recommended to 
partition and bucket your data based on your specific data volume, as this can 
improve data writing and query performance. Auto bucketing can result in a 
large number of tablets and an abundance of small files.
+4. For data volumes ranging from 10 million to 200 million rows, you can 
directly set a bucketing strategy and skip the partition configurations. (Doris 
internally has default partitions for the table).
+5. In **time-series scenarios**, it is recommended to include 
`compaction_policy" = "time_series` in the table properties when creating the 
table. This effectively reduces the write amplification of compaction, 
especially in continuous data ingestion scenarios. Note that it should be used 
in conjunction with inverted indexes.
+:::
+
+
+:::caution
+**Restrictions**
+
+1. The database character set should be specified as UTF-8 since only UTF-8 is 
supported.
+2. The replication factor for tables must be 3 (if not specified, it defaults 
to 3).
+3. The data volume of an individual tablet (**Tablet Count = Partition Count 
\* Bucket Count \* Replication Factor**) theoretically has no upper or lower 
bounds, except for small tables (in the range of hundreds of megabytes to one 
gigabyte), where it should be ensured to be within the range of 1 GB to 10 GB:
+   1. If the data volume of an individual tablet is too small, it can result 
in poor data aggregation performance and increased metadata management overhead.
+   2. If the data volume is too large, it hinders replica migration, 
synchronization, and increases the cost of schema changes or materialization 
operations (these operations are retried at the granularity of a tablet).
+4. For data exceeding 500 million records, **partitioning and bucketing** 
strategies must be implemented:
+   1. **Recommendations for bucketing:**
+      1. For large tables, each tablet should be in the range of 1 GB to 10 GB 
to prevent the generation of too many small files.
+      2. For dimension tables of approximately 100 megabytes, the number of 
tablets should be controlled within the range of 3 to 5. This ensures a certain 
level of concurrency without generating excessive small files.
+   2. If partitioning is not feasible and the data grows rapidly without the 
possibility of dynamic time-based partitioning, it is advisable to increase the 
number of buckets to accommodate the data volume based on the data retention 
period (180 days). It is still recommended to keep the size of each bucket 
between 1 GB and 10 GB.
+   3. Apply salting to the bucketing field and use the same salting strategy 
for queries in order to leverage bucket pruning capabilities.
+   4. Random bucketing:
+      1. If an OLAP table does not have fields that need to be updated, 
setting the data bucketing mode to RANDOM can avoid severe data skew. During 
data ingestion, each batch of data is randomly assigned to a tablet for writing.
+      2. When the bucketing mode for a table is set to RANDOM, since there is 
no bucketing column, querying the table will scan all buckets in the hit 
partitions instead of querying specific buckets based on the values of the 
bucketing column. This setting is suitable for overall aggregation and analysis 
queries rather than high-concurrency point queries.
+      3. If an OLAP table has a random distribution of data, setting the 
`load_to_single_tablet` parameter to true during data ingestion allows each 
task to write to a single tablet. This improves concurrency and throughput 
during large-scale data ingestion. It can also reduce the write amplification 
caused by data ingestion and compaction and ensure cluster stability.
+   5. Dimension tables, which grow slowly, can use a single partition and 
apply bucketing based on commonly used query conditions (where the data 
distribution of the bucketing field is relatively even).
+   6. Fact tables.
+5. If the bucketing field exhibits more than 30% data skew, it is recommended 
to avoid using the Hash bucketing strategy and instead use the RANDOM bucketing 
strategy.
+6. Dynamic partitioning should not be used for tables of a data size less than 
20 million rows. Because the dynamic partitioning method automatically creates 
partitions, and for small tables, it might unnecessarily create lots of 
partitions and buckets).
+7. For scenarios where there is a large amount of historical partitioned data 
but the historical data is relatively small, unbalanced, or queried 
infrequently, you can use the following approach to place the data in special 
partitions. You can create historical partitions for historical data of small 
sizes (e.g., yearly partitions, monthly partitions). For example, you can 
create historical partitions for data `FROM ("2000-01-01") TO ("2022-01-01") 
INTERVAL 1 YEAR`:
+   1. ( 
+
+   2.   PARTITION p00010101_1899 VALUES [('0001-01-01'), ('1900-01-01')), 
+
+   3.   PARTITION p19000101 VALUES [('1900-01-01'), ('1900-01-02')), 
+
+   4.   ... 
+
+   5.   PARTITION p19000104_1999 VALUES [('1900-01-04'), ('2000-01-01')),
+
+   6.   FROM ("2000-01-01") TO ("2022-01-01") INTERVAL 1 YEAR, 
+
+   7.   PARTITION p30001231 VALUES [('3000-12-31'), ('3001-01-01')), 
+
+   8.   PARTITION p99991231 VALUES [('9999-12-31'), (MAXVALUE)) 
+
+   9.  )
+8. The number of single-table materialized views should not exceed 6:
+   1. Single-table materialized views are built in real time.
+   2. On the Unique Key model, materialized views can only be used to reorder 
keys and cannot be used for data aggregation, as the aggregation model for the 
Unique Key model is Replace.
+:::


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [docs](doc)Update en doc (#461)

Reply via email to