This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new fa7a711258bf [Fix](Docs) fix Export Docs (#449)
fa7a711258bf is described below
commit fa7a711258bf0d3699c0b4ac6008b93ffc671e69
Author: Tiewei Fang <[email protected]>
AuthorDate: Fri Mar 22 12:44:07 2024 +0800
[Fix](Docs) fix Export Docs (#449)
---
.../Data-Manipulation-Statements/Manipulation/EXPORT.md | 7 +++++++
.../Data-Manipulation-Statements/Manipulation/EXPORT.md | 8 ++++++++
2 files changed, 15 insertions(+)
diff --git
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
index c6479d0e8245..fd350ab0bf5d 100644
---
a/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
+++
b/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
@@ -96,6 +96,8 @@ The bottom layer of the `Export` statement actually executes
the `select...outfi
- `with_bom`: The default is false. If it is set to true, the exported file
is encoded in UTF8 with BOM (valid only for CSV-related file format).
+ - `data_consisteency`: can be set to` none` / ` partition`, default value is
`none`. This parameter indicates the granularity at which the export table is
shred, `none` represents tablets level, and` partition` represents partition
level.
+
- `timeout`: This is the timeout parameter of the export job, the default
timeout is 2 hours, and the unit is seconds.
> Note that to use the `delete_existing_files` parameter, you also need to
add the configuration `enable_delete_existing_files = true` to the fe.conf file
and restart the FE. Only then will the `delete_existing_files` parameter take
effect. Setting `delete_existing_files = true` is a dangerous operation and it
is recommended to only use it in a testing environment.
@@ -360,6 +362,11 @@ WITH BROKER "broker_name"
- If a thread is responsible for 14 tablets and
`maximum_tablets_of_outfile_in_export = 10`, then the thread will be
responsible for two `SELECT INTO OUTFILE` statements. The first `SELECT INTO
OUTFILE` statement exports 10 tablets, and the second `SELECT INTO OUTFILE`
statement exports 4 tablets. The two `SELECT INTO OUTFILE` statements are
executed serially by this thread.
+ If you want to export the table with Parition granularity, you can set the
export property `"data_consistence" = "partition"`. At this time, the threads
of the export task are divided into multiple `Outfile` statements with Parition
particle size. The parition exported by different Outfile statements is
different. The data exported by the same `Outfile` statement are guaranted to
be belong to the same partition. For example: when you set `"data_consisteency"
= "partition"`:
+
+ - num(partition) = 40, parallelism = 3, then the three threads will be
responsible for 14, 13, and 13 partitions, respectively.
+ - num(partition) = 2, parallelism = 3, then Doris automatically sets the
parallelism to 2, and each thread is responsible for one partition.
+
#### memory limit
The query plan for an `Export Job` typically involves only `scanning and
exporting`, and does not involve compute logic that requires a lot of memory.
Therefore, the default memory limit of 2GB is usually sufficient to meet the
requirements.
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
index 28021c8d7f04..d7a582d958e8 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md
@@ -92,6 +92,8 @@ EXPORT
- `with_bom`: 默认为false,若指定为true,则导出的文件编码为带有BOM的UTF8编码(只对csv相关的文件格式生效)。
+ - `data_consistency`: 可以设置为 `none` / `partition` ,默认为 `none`
。指示以何种粒度切分导出表,`none` 代表 Tablets 级别,`partition`代表 Partition 级别。
+
- `timeout`:导出作业的超时时间,默认为2小时,单位是秒。
>
注意:要使用delete_existing_files参数,还需要在fe.conf中添加配置`enable_delete_existing_files =
true`并重启fe,此时delete_existing_files才会生效。delete_existing_files = true
是一个危险的操作,建议只在测试环境中使用。
@@ -348,6 +350,12 @@ Export 作业拆分成多个`SELECT INTO OUTFILE`的具体逻辑是:将该表
当所要导出的数据量很大时,可以考虑适当调大`parallelism`参数来增加并发导出。若机器核数紧张,无法再增加`parallelism`
而导出表的Tablets又较多 时,可以考虑调大`maximum_tablets_of_outfile_in_export`来增加一个`SELECT INTO
OUTFILE`语句负责的tablets数量,也可以加快导出速度。
+若希望以 Parition 粒度导出 Table ,可以设置 Export 属性 `"data_consistency" = "partition"`
,此时 Export 任务并发的线程会以 Parition 粒度来划分为多个 Outfile 语句,不同的 Outfile 语句导出的 Parition
不同,而同一个 Outfile 语句导出的数据一定属于同一个 Partition。如:设置 `"data_consistency" =
"partition"` 后
+
+- num(partition) = 40, parallelism = 3,则这3个线程各自负责的 Partition 数量分别为 14,13,13个。
+- num(partition) = 2, parallelism = 3,则 Doris 会自动将 Parallelism 设置为2,每一个线程负责一个
Partition 。
+
+
#### 内存限制
通常一个 Export 作业的查询计划只有 `扫描-导出` 两部分,不涉及需要太多内存的计算逻辑。所以通常 2GB 的默认内存限制可以满足需求。
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]