This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git


The following commit(s) were added to refs/heads/master by this push:
     new fa7a711258bf [Fix](Docs) fix Export Docs (#449)
fa7a711258bf is described below

commit fa7a711258bf0d3699c0b4ac6008b93ffc671e69
Author: Tiewei Fang <[email protected]>
AuthorDate: Fri Mar 22 12:44:07 2024 +0800

    [Fix](Docs) fix Export Docs (#449)
---
 .../Data-Manipulation-Statements/Manipulation/EXPORT.md           | 7 +++++++
 .../Data-Manipulation-Statements/Manipulation/EXPORT.md           | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
index c6479d0e8245..fd350ab0bf5d 100644
--- 
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
+++ 
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
@@ -96,6 +96,8 @@ The bottom layer of the `Export` statement actually executes 
the `select...outfi
   
   - `with_bom`: The default is false. If it is set to true, the exported file 
is encoded in UTF8 with BOM (valid only for CSV-related file format).
 
+  - `data_consisteency`: can be set to` none` / ` partition`, default value is 
`none`. This parameter indicates the granularity at which the export table is 
shred, `none` represents tablets level, and` partition` represents partition 
level.
+
   - `timeout`: This is the timeout parameter of the export job, the default 
timeout is 2 hours, and the unit is seconds.
 
   > Note that to use the `delete_existing_files` parameter, you also need to 
add the configuration `enable_delete_existing_files = true` to the fe.conf file 
and restart the FE. Only then will the `delete_existing_files` parameter take 
effect. Setting `delete_existing_files = true` is a dangerous operation and it 
is recommended to only use it in a testing environment.
@@ -360,6 +362,11 @@ WITH BROKER "broker_name"
 
   - If a thread is responsible for 14 tablets and 
`maximum_tablets_of_outfile_in_export = 10`, then the thread will be 
responsible for two `SELECT INTO OUTFILE` statements. The first `SELECT INTO 
OUTFILE` statement exports 10 tablets, and the second `SELECT INTO OUTFILE` 
statement exports 4 tablets. The two `SELECT INTO OUTFILE` statements are 
executed serially by this thread.
 
+  If you want to export the table with Parition granularity, you can set the 
export property `"data_consistence" = "partition"`. At this time, the threads 
of the export task are divided into multiple `Outfile` statements with Parition 
particle size. The parition exported by different Outfile statements is 
different. The data exported by the same `Outfile` statement are guaranted to 
be belong to the same partition. For example: when you set `"data_consisteency" 
= "partition"`:
+
+  - num(partition) = 40, parallelism = 3, then the three threads will be 
responsible for 14, 13, and 13 partitions, respectively.
+  - num(partition) = 2, parallelism = 3, then Doris automatically sets the 
parallelism to 2, and each thread is responsible for one partition.
+
   #### memory limit
 
   The query plan for an `Export Job` typically involves only `scanning and 
exporting`, and does not involve compute logic that requires a lot of memory. 
Therefore, the default memory limit of 2GB is usually sufficient to meet the 
requirements.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
index 28021c8d7f04..d7a582d958e8 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
@@ -92,6 +92,8 @@ EXPORT
 
   - `with_bom`: 默认为false,若指定为true,则导出的文件编码为带有BOM的UTF8编码(只对csv相关的文件格式生效)。
 
+  - `data_consistency`: 可以设置为 `none` / `partition` ,默认为 `none` 
。指示以何种粒度切分导出表,`none` 代表 Tablets 级别,`partition`代表 Partition 级别。
+
   - `timeout`:导出作业的超时时间,默认为2小时,单位是秒。
 
   > 
注意:要使用delete_existing_files参数,还需要在fe.conf中添加配置`enable_delete_existing_files = 
true`并重启fe,此时delete_existing_files才会生效。delete_existing_files = true 
是一个危险的操作,建议只在测试环境中使用。
@@ -348,6 +350,12 @@ Export 作业拆分成多个`SELECT INTO OUTFILE`的具体逻辑是:将该表
 
 当所要导出的数据量很大时,可以考虑适当调大`parallelism`参数来增加并发导出。若机器核数紧张,无法再增加`parallelism` 
而导出表的Tablets又较多 时,可以考虑调大`maximum_tablets_of_outfile_in_export`来增加一个`SELECT INTO 
OUTFILE`语句负责的tablets数量,也可以加快导出速度。
 
+若希望以 Parition 粒度导出 Table ,可以设置 Export 属性 `"data_consistency" = "partition"` 
,此时 Export 任务并发的线程会以 Parition 粒度来划分为多个 Outfile 语句,不同的 Outfile 语句导出的 Parition 
不同,而同一个 Outfile 语句导出的数据一定属于同一个 Partition。如:设置 `"data_consistency" = 
"partition"` 后
+
+- num(partition) = 40, parallelism = 3,则这3个线程各自负责的 Partition 数量分别为 14,13,13个。
+- num(partition) = 2, parallelism = 3,则 Doris 会自动将 Parallelism 设置为2,每一个线程负责一个 
Partition 。
+
+
 #### 内存限制
 
 通常一个 Export 作业的查询计划只有 `扫描-导出` 两部分,不涉及需要太多内存的计算逻辑。所以通常 2GB 的默认内存限制可以满足需求。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to