(doris-website) branch master updated: [docs]update en docs of stream load (#486)

luzhijing Thu, 28 Mar 2024 04:44:42 -0700

This is an automated email from the ASF dual-hosted git repository.

luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 455911676a71 [docs]update en docs of stream load (#486)
455911676a71 is described below

commit 455911676a7183ca8ef76426a5f9a65a6a4a46ae
Author: Luzhijing <[email protected]>
AuthorDate: Thu Mar 28 19:44:26 2024 +0800

    [docs]update en docs of stream load (#486)
---
 .../data-operate/import/stream-load-manual.md      |   28 +-
 .../data-operate/import/stream-load-manual.md      | 1076 +++++++++++++++++++-
 2 files changed, 1048 insertions(+), 56 deletions(-)

diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/stream-load-manual.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/stream-load-manual.md
index 170b352084af..4800abae7a97 100644
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/stream-load-manual.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/import/stream-load-manual.md
@@ -310,10 +310,10 @@ Stream Load 操作支持 HTTP 分块导入（HTTP chunked）与 HTTP 非分块
 | label                        | 用于指定 Doris 该次导入的标签，标签相同的数据无法多次导入。如果不指定 
label，Doris 会自动生成一个标签。用户可以通过指定 label 的方式来避免一份数据重复导入的问题。Doris 默认保留三天内的导入作业标签，可以 
`label_keep_max_second` 调整保留时长。例如，指定本次导入 label 为 123，需要指定命令 `-H 
"label:123"`。label 的使用，可以防止用户重复导入相同的数据。强烈推荐用户同一批次数据使用相同的 
label。这样同一批次数据的重复请求只会被接受一次，保证了 At-Most-Once 当 label 对应的导入作业状态为 CANCELLED 时，该 
label 可以再次被使用。 |
 | column_separator             | 
用于指定导入文件中的列分隔符，默认为\t。如果是不可见字符，则需要加\x作为前缀，使用十六进制来表示分隔符。可以使用多个字符的组合作为列分隔符。例如，hive 
文件的分隔符 \x01，需要指定命令 `-H "column_separator:\x01"`。 |
 | line_delimiter               | 用于指定导入文件中的换行符，默认为 
\n。可以使用做多个字符的组合作为换行符。例如，指定换行符为 \n，需要指定命令 `-H "line_delimiter:\n"`。 |
-| columns                      | 用于指定导入文件中的列和 table 
中的列的对应关系。如果源文件中的列正好对应表中的内容，那么是不需要指定这个字段的内容的。如果源文件与表 schema 
不对应，那么需要这个字段进行一些数据转换。有两种形式 column：直接对应导入文件中的字段，直接使用字段名表示衍生列，语法为 `column_name` = 
expression 详细案例参考“导入过程中数据转换”。 |
+| columns                      | 用于指定导入文件中的列和 table 
中的列的对应关系。如果源文件中的列正好对应表中的内容，那么是不需要指定这个字段的内容的。如果源文件与表 schema 
不对应，那么需要这个字段进行一些数据转换。有两种形式 column：直接对应导入文件中的字段，直接使用字段名表示衍生列，语法为 `column_name` = 
expression 详细案例参考 [导入过程中数据转换](../data-operate/import/load-data-convert)。 |
 | where                        | 
用于抽取部分数据。用户如果有需要将不需要的数据过滤掉，那么可以通过设定这个选项来达到。例如，只导入大于 k1 列等于 20180601 
的数据，那么可以在导入时候指定 `-H "where: k1 = 20180601"`。 |
 | max_filter_ratio             | 最大容忍可过滤（数据不规范等原因）的数据比例，默认零容忍。取值范围是 
0~1。当导入的错误率超过该值，则导入失败。数据不规范不包括通过 where 条件过滤掉的行。例如，最大程度保证所有正确的数据都可以导入（容忍度 
100%），需要指定命令 `-H "max_filter_ratio:1"`。 |
-| partitions                   | 用于指定这次导入所设计的 partition。如果用户能够确定数据对应的 
partition，推荐指定该项。不满足这些分区的数据将被过滤掉。例如，指定导入到 p1, p2 分区，需要指定命令 `-H "partitions: p1, 
p2"`。 |
+| partitions                   | 用于指定这次导入所涉及的 partition。如果用户能够确定数据对应的 
partition，推荐指定该项。不满足这些分区的数据将被过滤掉。例如，指定导入到 p1, p2 分区，需要指定命令 `-H "partitions: p1, 
p2"`。 |
 | timeout                      | 指定导入的超时时间。单位秒。默认是 600 秒。可设置范围为 1 秒 ~ 259200 
秒。例如，指定导入超时时间为 1200s，需要指定命令 `-H "timeout:1200"`。 |
 | strict_mode                  | 用户指定此次导入是否开启严格模式，默认为关闭。例如，指定开启严格模式，需要指定命令 `-H 
"strict_mode:true"`。 |
 | timezone                     | 
指定本次导入所使用的时区。默认为东八区。该参数会影响所有导入涉及的和时区有关的函数结果。例如，指定导入时区为 Africa/Abidjan，需要指定命令 
`-H "timezone:Africa/Abidjan"`。 |
@@ -335,7 +335,7 @@ Stream Load 操作支持 HTTP 分块导入（HTTP chunked）与 HTTP 非分块
 | trim_double_quotes           | 布尔类型，默认值为 false，为 true 时表示裁剪掉 CSV 
文件每个字段最外层的双引号。 |
 | skip_lines                   | 整数类型，默认值为 0，含义为跳过 CSV 文件的前几行。当设置 format 设置为 
`csv_with_names`或`csv_with_names_and_types`时，该参数会失效。 |
 | comment                      | 字符串类型，默认值为空。给任务增加额外的信息。               |
-| enclose                      | 指定包围符。当 CSV 
数据字段中含有行分隔符或列分隔符时，为防止意外截断，可指定单字节字符作为包围符起到保护作用。例如列分隔符为 ","，包围符为 "'"，数据为 
"a,'b,c'"，则 "b,c" 会被解析为一个字段。注意：当 enclose 设置为`"`时，trim_double_quotes 一定要设置为 
true。|
+| enclose                      | 指定包围符。当 CSV 
数据字段中含有行分隔符或列分隔符时，为防止意外截断，可指定单字节字符作为包围符起到保护作用。例如列分隔符为 ","，包围符为 "'"，数据为 
"a,'b,c'"，则 "b,c" 会被解析为一个字段。注意：当 enclose 设置为`"`时，trim_double_quotes 一定要设置为 
true。 |
 | escape                       | 指定转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为 
"a,'b,'c'"，包围符为 "'"，希望 "b,'c 被作为一个字段解析，则需要指定单字节转义符，例如"\"，将数据修改为 "a,'b,\'c'"。 |
 
 ### 导入返回值
@@ -398,7 +398,7 @@ Stream Load 是一种同步的导入方式，导入结果会通过创建导入
 使用 TVF http_stream 进行 Stream Load 导入时的 Rest API URL 不同于 Stream Load 普通导入的 URL。
 
 - 普通导入的 URL 为：
-    
+  
     http://fe_host:http_port/api/{db}/{table}/_stream_load
 
 - 使用 TVF http_stream 导入的 URL 为：
@@ -457,7 +457,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
 
 ### 设置导入最大容错率
 
-Doris 的导入任务可以容忍一部分格式错误的数据。容忍率通过 `max_filter_ratio` 设置。默认为 
0，即表示当有一条错误数据时，整个导入任务将会失败。如果用户希望忽略部分有问题的数据行，可以将次参数设置为 0~1 之间的数值，Doris 
会自动跳过哪些数据格式不正确的行。关于容忍率的一些计算方式，可以参阅 
[数据转化](../../data-operate/import/load-data-convert) 文档。
+Doris 的导入任务可以容忍一部分格式错误的数据。容忍率通过 `max_filter_ratio` 设置。默认为 
0，即表示当有一条错误数据时，整个导入任务将会失败。如果用户希望忽略部分有问题的数据行，可以将次参数设置为 0~1 之间的数值，Doris 
会自动跳过哪些数据格式不正确的行。关于容忍率的一些计算方式，可以参阅 
[数据转换](../../data-operate/import/load-data-convert) 文档。
 
 通过以下命令可以指定 max_filter_ratio 容忍度为 0.4 创建 stream load 导入任务：
 
@@ -507,7 +507,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
 
 由于 Doris 目前没有内置时区的时间类型，所有 `DATETIME` 相关类型均只表示绝对的时间点，而不包含时区信息，不因 Doris 
系统时区变化而发生变化。因此，对于带时区数据的导入，我们统一的处理方式为将其转换为特定目标时区下的数据。在 Doris 系统中，即 session 
variable `time_zone` 所代表的时区。
 
-而在导入中，我们的目标时区通过参数 `timezone` 指定，该变量在发生时区转换、运算时区敏感函数时将会替代 session variable 
`time_zone`。因此，如果没有特殊情况，在导入事务中应当设定 `timezone` 与当前 Doris 集群的 `time_zone` 
一致。此时意味着所有带时区的时间数据，均会发生向该时区的转换。例如，Doris 系统时区为 "+08:00"，导入数据中的时间列包含两条数据，分别为 
"2012-01-01 01:00:00Z" 和 "2015-12-12 12:12:12-08:00"，则我们在导入时通过 `-H "timezone: 
+08:00"` 指定导入事务的时区后，这两条数据都会向该时区发生转换，从而得到结果 "2012-01-01 09:00:00" 和 "2015-12-13 
04:12:12"。
+而在导入中，我们的目标时区通过参数 `timezone` 指定，该变量在发生时区转换、运算时区敏感函数时将会替代 session variable 
`time_zone`。因此，如果没有特殊情况，在导入事务中应当设定 `timezone` 与当前 Doris 集群的 `time_zone` 
一致。此时意味着所有带时区的时间数据，均会发生向该时区的转换。例如，Doris 系统时区为 "+08:00"，导入数据中的时间列包含两条数据，分别为 
"2012-01-01 01:00:00" 和 "2015-12-12 12:12:12-08:00"，则我们在导入时通过 `-H "timezone: 
+08:00"` 指定导入事务的时区后，这两条数据都会向该时区发生转换，从而得到结果 "2012-01-01 09:00:00" 和 "2015-12-13 
04:12:12"。
 
 ### 使用 Streaming 方式导入
 
@@ -549,7 +549,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
     -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
 ```
 
-如导入数据前标重数据为：
+如导入数据前表中数据为：
 
 ```sql
 +--------+----------+----------+------+
@@ -567,7 +567,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
 3,2,tom,0
 ```
 
-导入后会删除原表数据，变成一下结果集
+导入后会删除原表数据，变成以下结果集
 
 ```sql
 +--------+----------+----------+------+
@@ -742,7 +742,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
 张三,30,'上海市，黄浦区，\'大沽路'
 ```
 
-可以通过 escape 参数可以指定单字节转义字符，如上例中 `\`:
+可以通过 escape 参数可以指定单字节转义字符，如下例中 `\`:
 
 ```sql
 curl --location-trusted -u <doris_user>:<doris_password> \
@@ -1001,7 +1001,7 @@ AGGREGATE KEY(typ_id,hou)
 DISTRIBUTED BY HASH(typ_id,hou) BUCKETS 10;
 ```
 
-通过以 to_bitmap 可以将数据转化成 Bitmap 类型：
+通过以 to_bitmap 可以将数据转换成 Bitmap 类型：
 
 ```sql
 curl --location-trusted -u <doris_user>:<doris_password> \
@@ -1013,7 +1013,7 @@ curl --location-trusted -u <doris_user>:<doris_password> \
 
 ### 导入 HLL 数据类型
 
-通过 hll_hash 函数可以将数据转化成 hll 类型，如下数据：
+通过 hll_hash 函数可以将数据转换成 hll 类型，如下数据：
 
 ```SQL
 1001|koga
@@ -1056,7 +1056,7 @@ Doris 中所有导入任务都是原子生效的。并且在同一个导入任
 
 ### 列映射、衍生列和过滤
 
-Doris 可以在导入语句中支持非常丰富的列转换和过滤操作。支持绝大多数内置函数和 UDF。关于如何正确的使用这个功能，可参阅 
[数据转化](../../data-operate/import/load-data-convert) 文档。
+Doris 可以在导入语句中支持非常丰富的列转换和过滤操作。支持绝大多数内置函数和 UDF。关于如何正确的使用这个功能，可参阅 
[数据转换](../../data-operate/import/load-data-convert) 文档。
 
 ### 启用严格模式导入
 
@@ -1064,8 +1064,8 @@ Doris 可以在导入语句中支持非常丰富的列转换和过滤操作。
 
 ### 导入时进行部分列更新
 
-关于导入时，如何表达部分列更新，可以参考 数据操作/数据更新 文档
+关于导入时，如何表达部分列更新，可以参考 [数据操作/数据更新](../) 文档
 
 ## 更多帮助
 
-关于 Stream Load 使用的更多详细语法及最佳实践，请参阅 [Stream 
Load](../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD)
 命令手册，你也可以在 MySql 客户端命令行下输入 `HELP STREAM LOAD` 获取更多帮助信息。
\ No newline at end of file
+关于 Stream Load 使用的更多详细语法及最佳实践，请参阅 [Stream 
Load](../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD)
 命令手册，你也可以在 MySQL 客户端命令行下输入 `HELP STREAM LOAD` 获取更多帮助信息。
\ No newline at end of file
diff --git 
a/versioned_docs/version-2.0/data-operate/import/stream-load-manual.md 
b/versioned_docs/version-2.0/data-operate/import/stream-load-manual.md
index 21482df72f84..8ec7695a8e6d 100644
--- a/versioned_docs/version-2.0/data-operate/import/stream-load-manual.md
+++ b/versioned_docs/version-2.0/data-operate/import/stream-load-manual.md
@@ -24,11 +24,9 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Stream Load
+Stream Load supports importing local files or data streams into Doris through 
the HTTP protocol. 
 
-Stream load is a synchronous way of importing. Users import local files or 
data streams into Doris by sending HTTP protocol requests. Stream load 
synchronously executes the import and returns the import result. Users can 
directly determine whether the import is successful by the return body of the 
request.
-
-Stream load is mainly suitable for importing local files or data from data 
streams through procedures.
+Stream Load is a synchronous import method that returns the import result 
after the import is executed, allowing you to determine the success of the 
import through the request response. Generally, user can use Stream Load to 
import files under 10GB. If the file is too large, it is recommended to split 
the file and then use Stream Load for importing. Stream Load can ensure the 
atomicity of a batch of import tasks, meaning that either all of them succeed 
or all of them fail.
 
 :::tip
 
@@ -43,47 +41,1041 @@ In comparison to single-threaded load using `curl`, Doris 
Streamloader is a clie
 See [Doris 
Streamloader](https://doris.apache.org/docs/ecosystem/doris-streamloader/) for 
detailed instructions and best practices.
 :::
 
-## Basic Principles
+## User guide
+
+### Supported formats
+
+Stream Load supports importing data in CSV, JSON, Parquet, and ORC formats.
+
+### Usage limitations
+
+When importing CSV files, it is necessary to clearly distinguish between null 
values and empty strings:
+
+- Null values need to be represented by `\N`. For example, the data "a,\N,b" 
indicates that the middle column is a null value.
+- Empty strings can be represented by leaving the data empty. For example, the 
data "a, ,b" indicates that the middle column is an empty string.
+
+### Basic principles
+
+When using Stream Load, it is necessary to initiate an import job through the 
HTTP protocol to the FE (Frontend) node. The FE will redirect the request to a 
BE (Backend) node in a round-robin manner to achieve load balancing. It is also 
possible to send HTTP requests directly to a specific BE node. In Stream Load, 
Doris selects one node to serve as the Coordinator node. The Coordinator node 
is responsible for receiving data and distributing it to other nodes.
 
 The following figure shows the main flow of Stream load, omitting some import 
details.
 
+![Stream Load 基本原理](/images/stream-load.png)
+
+1. The client submits a Stream Load import job request to the FE (Frontend).
+2. The FE randomly selects a BE (Backend) as the Coordinator node, which is 
responsible for scheduling the import job, and then returns an HTTP redirect to 
the client.
+3. The client connects to the Coordinator BE node and submits the import 
request.
+4. The Coordinator BE distributes the data to the appropriate BE nodes and 
returns the import result to the client once the import is complete.
+5. Alternatively, the client can directly specify a BE node as the Coordinator 
and distribute the import job directly.
+
+## Quick start
+
+Stream Load import data through the HTTP protocol. The following example uses 
the curl tool to demonstrate submitting an import job through Stream Load.
+
+For detailed syntax, please refer to [STREAM 
LOAD](../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD).
+
+### Prerequisite check
+
+Stream Load requires `INSERT` privileges on the target table. If there are no 
`INSERT` privileges, it can be granted to the user through the 
[GRANT](../../sql-manual/sql-reference/Account-Management-Statements/GRANT) 
command.
+
+### Create load job
+
+#### Loading CSV 
+
+1. Creating loading data
+
+   Create a CSV file named `streamload_example.csv`. The specific content is 
as follows
+
+```sql
+1,Emily,25
+2,Benjamin,35
+3,Olivia,28
+4,Alexander,60
+5,Ava,17
+6,William,69
+7,Sophia,32
+8,James,64
+9,Emma,37
+10,Liam,64
+```
+
+2. Creating a table for loading
+
+   Create the table that will be imported into, using the specific syntax as 
follows:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    user_id            BIGINT       NOT NULL COMMENT "用户 ID",
+    name               VARCHAR(20)           COMMENT "用户姓名",
+    age                INT                   COMMENT "用户年龄"
+)
+DUPLICATE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+3. Enabling the load job
+
+   The Stream Load job can be submitted using the `curl` command.
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+      Stream Load is a synchronous method, where the result is directly 
returned to the user.
+
+```sql
+{
+    "TxnId": 3,
+    "Label": "123",
+    "Comment": "",
+    "TwoPhaseCommit": "false",
+    "Status": "Success",
+    "Message": "OK",
+    "NumberTotalRows": 10,
+    "NumberLoadedRows": 10,
+    "NumberFilteredRows": 0,
+    "NumberUnselectedRows": 0,
+    "LoadBytes": 118,
+    "LoadTimeMs": 173,
+    "BeginTxnTimeMs": 1,
+    "StreamLoadPutTimeMs": 70,
+    "ReadDataTimeMs": 2,
+    "WriteDataTimeMs": 48,
+    "CommitAndPublishTimeMs": 52
+}
+```
+
+4. View data
+
+```sql
+mysql> select count(*) from testdb.test_streamload;
++----------+
+| count(*) |
++----------+
+|       10 |
++----------+
+```
+
+#### Loading JSON 
+
+1. Creating loading data
+
+Create a JSON file named `streamload_example.json` . The specific content is 
as follows
+
+```sql
+[
+{"userid":1,"username":"Emily","userage":25},
+{"userid":2,"username":"Benjamin","userage":35},
+{"userid":3,"username":"Olivia","userage":28},
+{"userid":4,"username":"Alexander","userage":60},
+{"userid":5,"username":"Ava","userage":17},
+{"userid":6,"username":"William","userage":69},
+{"userid":7,"username":"Sophia","userage":32},
+{"userid":8,"username":"James","userage":64},
+{"userid":9,"username":"Emma","userage":37},
+{"userid":10,"username":"Liam","userage":64}
+]
+```
+
+2. Creating a table for loading
+
+   Create the table that will be imported into, using the specific syntax as 
follows:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    user_id            BIGINT       NOT NULL COMMENT "用户 ID",
+    name               VARCHAR(20)           COMMENT "用户姓名",
+    age                INT                   COMMENT "用户年龄"
+)
+DUPLICATE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+3. Enabling the load job
+
+   The Stream Load job can be submitted using the `curl` command.
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "label:124" \
+    -H "Expect:100-continue" \
+    -H "format:json" -H "strip_outer_array:true" \
+    -H "jsonpaths:[\"$.userid\", \"$.username\", \"$.userage\"]" \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.json \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+      Stream Load is a synchronous method, where the result is directly 
returned to the user.
+
+```sql
+{
+    "TxnId": 7,
+    "Label": "125",
+    "Comment": "",
+    "TwoPhaseCommit": "false",
+    "Status": "Success",
+    "Message": "OK",
+    "NumberTotalRows": 10,
+    "NumberLoadedRows": 10,
+    "NumberFilteredRows": 0,
+    "NumberUnselectedRows": 0,
+    "LoadBytes": 471,
+    "LoadTimeMs": 52,
+    "BeginTxnTimeMs": 0,
+    "StreamLoadPutTimeMs": 11,
+    "ReadDataTimeMs": 0,
+    "WriteDataTimeMs": 23,
+    "CommitAndPublishTimeMs": 16
+}
+```
+
+### View load job
+
+By default, Stream Load synchronously returns results to the client, so the 
system does not record Stream Load historical jobs. If recording is required, 
add the configuration `enable_stream_load_record=true` in `be.conf`. Refer to 
the [BE configuration 
options](https://doris.apache.org/zh-CN/docs/admin-manual/config/be-config) for 
specific details.
+
+After configuring, you can use the `show stream load` command to view 
completed Stream Load jobs.
+
+```sql
+mysql> show stream load from testdb;
++-------+--------+-----------------+---------------+---------+---------+------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+------+---------+
+| Label | Db     | Table           | ClientIp      | Status  | Message | Url  
| TotalRows | LoadedRows | FilteredRows | UnselectedRows | LoadBytes | 
StartTime               | FinishTime              | User | Comment |
++-------+--------+-----------------+---------------+---------+---------+------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+------+---------+
+| 12356 | testdb | test_streamload | 192.168.88.31 | Success | OK      | N/A  
| 10        | 10         | 0            | 0              | 118       | 
2023-11-29 08:53:00.594 | 2023-11-29 08:53:00.650 | root |         |
++-------+--------+-----------------+---------------+---------+---------+------+-----------+------------+--------------+----------------+-----------+-------------------------+-------------------------+------+---------+
+1 row in set (0.00 sec)
+```
+
+### Cancel load job
+
+Users cannot manually cancel a Stream Load operation. A Stream Load job will 
be automatically canceled by the system if it encounters a timeout (set to 0) 
or an import error.
+
+## Reference manual
+
+### Command
+
+The syntax for Stream Load is as follows:
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+  -H "Expect:100-continue" [-H ""...] \
+  -T <file_path> \
+  -XPUT http://fe_host:http_port/api/{db}/{table}/_stream_load
+```
+
+Stream Load operations support both HTTP chunked and non-chunked import 
methods. For non-chunked imports, it is necessary to have a Content-Length 
header to indicate the length of the uploaded content, which ensures data 
integrity.
+
+### Load configuration parameters
+
+#### FE configuration
+
+1. `stream_load_default_timeout_second`
+
+   - Default Value: 259200 (s)
+
+   - Dynamic Configuration: Yes
+   - FE Master-only Configuration: Yes
+
+Parameter Description: The default timeout for Stream Load. The load job will 
be canceled by the system if it is not completed within the set timeout (in 
seconds). If the source file cannot be imported within the specified time, the 
user can set an individual timeout in the Stream Load request. Alternatively, 
adjust the `stream_load_default_timeout_second` parameter on the FE to set the 
global default timeout.
+
+2. `enable_pipeline_load`
+
+Determines whether to enable the Pipeline engine to execute Streamload tasks. 
See the [import](./load-manual) documentation for more details.
+
+#### BE configuration
+
+1. `streaming_load_max_mb`
+
+   - Default value: 10240 (MB)
+   - Dynamic configuration: Yes
+   - Parameter description: The maximum import size for Stream load. If the 
user's original file exceeds this value, the `streaming_load_max_mb` parameter 
on the BE needs to be adjusted.
+
+2. Header parameters
+
+   Load parameters can be passed through the HTTP Header section. See below 
for specific parameter descriptions.
+
+| Parameters                   | Parameters description                        
               |
+| ---------------------------- | 
------------------------------------------------------------ |
+| label                        | Used to specify a label for this Doris 
import. Data with the same label cannot be imported multiple times. If no label 
is specified, Doris will automatically generate one. Users can avoid duplicate 
imports of the same data by specifying a label. Doris retains import job labels 
for three days by default, but this duration can be adjusted using 
`label_keep_max_second`. For example, to specify the label for this import as 
123, use the command `-H "label:123" [...]
+| column_separator             | Used to specify the column separator in the 
import file, which defaults to `\t`. If the separator is an invisible 
character, it needs to be prefixed with `\x` and represented in hexadecimal 
format. Multiple characters can be combined as a column separator. For example, 
to specify the separator as `\x01` for a Hive file, use the command `-H 
"column_separator:\x01"`. |
+| line_delimiter               | Used to specify the line delimiter in the 
import file, which defaults to `\n`. Multiple characters can be combined as a 
line delimiter. For example, to specify the line delimiter as `\n`, use the 
command `-H "line_delimiter:\n"`. |
+| columns                      | Used to specify the correspondence between 
columns in the import file and columns in the table. If the columns in the 
source file exactly match the content of the table, this field does not need to 
be specified. If the schema of the source file does not match the table, this 
field is required for data transformation. There are two formats: direct column 
correspondence to fields in the import file, and derived columns represented by 
expressions. Refer to [ [...]
+| where                        | Used to filter out unnecessary data. If users 
need to exclude certain data, they can achieve this by setting this option. For 
example, to import only data where the k1 column is equal to 20180601, specify 
`-H "where: k1 = 20180601"` during the import. |
+| max_filter_ratio             | Used to specify the maximum tolerable ratio 
of filterable (irregular or otherwise problematic) data, which defaults to zero 
tolerance. The value range is 0 to 1. If the error rate of the imported data 
exceeds this value, the import will fail. Irregular data does not include rows 
filtered out by the where condition. For example, to maximize the import of all 
correct data (100% tolerance), specify the command `-H "max_filter_ratio:1"`. |
+| partitions                   | Used to specify the partitions involved in 
this import. If users can determine the corresponding partitions for the data, 
it is recommended to specify this option. Data that does not meet these 
partition criteria will be filtered out. For example, to specify importing into 
partitions p1 and p2, use the command `-H "partitions: p1, p2"`. |
+| timeout                      | Used to specify the timeout for the import in 
seconds. The default is 600 seconds, and the configurable range is from 1 
second to 259200 seconds. For example, to specify an import timeout of 1200 
seconds, use the command `-H "timeout:1200"`. |
+| strict_mode                  | Used to specify whether to enable strict mode 
for this import, which is disabled by default. For example, to enable strict 
mode, use the command `-H "strict_mode:true"`. |
+| timezone                     | Used to specify the timezone to be used for 
this import, which defaults to GMT+8. This parameter affects the results of all 
timezone-related functions involved in the import. For example, to specify the 
import timezone as Africa/Abidjan, use the command `-H 
"timezone:Africa/Abidjan"`. |
+| exec_mem_limit               | The memory limit for the import, which 
defaults to 2GB. The unit is bytes. |
+| format                       | Used to specify the format of the imported 
data, which defaults to CSV. Currently supported formats include: csv, json, 
csv_with_names (supports filtering the first row of the csv file), 
csv_with_names_and_types (supports filtering the first two rows of the csv 
file), parquet, and orc. For example, to specify the imported data format as 
json, use the command `-H "format:json"`. |
+| jsonpaths                    | There are two ways to import JSON data 
format: Simple Mode and Matching Mode.  If no jsonpaths are specified, it is 
the simple mode that requires the JSON data to be of the object type.Matching 
mode used when the JSON data is relatively complex and requires matching the 
corresponding values through the jsonpaths parameter.In simple mode, the keys 
in JSON are required to correspond one-to-one with the column names in the 
table. For example, in the JSON dat [...]
+| strip_outer_array            | When `strip_outer_array` is set to true, it 
indicates that the JSON data starts with an array object and flattens the 
objects within the array. The default value is false. When the outermost layer 
of the JSON data is represented by `[]`, which denotes an array, 
`strip_outer_array` should be set to true. For example, with the following 
data, setting `strip_outer_array` to true will result in two rows of data being 
generated when imported into Doris: `[{"k1 [...]
+| json_root                    | `json_root` is a valid jsonpath string that 
specifies the root node of a JSON document, with a default value of "". |
+| merge_type                   | There are three types of data merging: 
APPEND, DELETE, and MERGE. APPEND is the default value, indicating that this 
batch of data needs to be appended to the existing data. DELETE means to remove 
all rows that have the same keys as this batch of data. MERGE semantics need to 
be used in conjunction with delete conditions. It means that data satisfying 
the delete conditions will be processed according to DELETE semantics, while 
the rest will be processed ac [...]
+| delete                       | It is only meaningful under MERGE, 
representing the deletion conditions for data. |
+| function_column.sequence_col | It is suitable only for the UNIQUE KEYS 
model. Within the same Key column, it ensures that the Value column is replaced 
according to the specified source_sequence column. The source_sequence can 
either be a column from the data source or an existing column in the table 
structure. |
+| fuzzy_parse                  | It is a boolean type. If set to true, the 
JSON will be parsed with the first row as the schema. Enabling this option can 
improve the efficiency of JSON imports, but it requires that the order of the 
keys in all JSON objects be consistent with the first line. The default is 
false and it is only used for JSON format. |
+| num_as_string                | It is a boolean type. When set to true, 
indicates that numeric types will be converted to strings during JSON parsing 
to ensure no loss of precision during the import process. |
+| read_json_by_line            | It is a boolean type. When set to true, 
indicates support for reading one JSON object per line, defaulting to false. |
+| send_batch_parallelism       | An integer used to set the parallelism for 
sending batch-processed data. If the parallelism value exceeds the 
`max_send_batch_parallelism_per_job` configured in BE, the coordinating BE will 
use the `max_send_batch_parallelism_per_job value`. |
+| hidden_columns               | Used to specify hidden columns in the 
imported data, which takes effect when the Header does not include Columns. 
Multiple hidden columns are separated by commas. The system will use the 
user-specified data for import. In the following example, the last column of 
data in the imported data is `__DORIS_SEQUENCE_COL__`. `hidden_columns: 
__DORIS_DELETE_SIGN__,__DORIS_SEQUENCE_COL__`. |
+| load_to_single_tablet        | It is a boolean type. When set to true, 
indicates support for importing data only to a single Tablet corresponding to 
the partition, defaulting to false. This parameter is only allowed when 
importing to an OLAP table with random bucketing. |
+| compress_type                | Currently, only compression of CSV files is 
supported. Compression formats include gz, lzo, bz2, lz4, lzop, and deflate. |
+| trim_double_quotes           | It is a boolean type. When set to true, 
indicates trimming of the outermost double quotes for each field in the CSV 
file, defaulting to false. |
+| skip_lines                   | It is an integer type. Used to specify the 
number of lines to skip at the beginning of the CSV file, defaulting to 0. When 
the `format` is set to `csv_with_names` or `csv_with_names_and_types`, this 
parameter will become invalid. |
+| comment                      | It is a String type, with an empty string as 
the default value. Used to add additional information to the task. |
+| enclose                      | Specify the enclosure character. When a CSV 
data field contains a row delimiter or column delimiter, to prevent unexpected 
truncation, you can specify a single-byte character as the enclosure for 
protection. For example, if the column delimiter is "," and the enclosure is 
"'", the data "a,'b,c'" will have "b,c" parsed as a single field. Note: When 
the enclosure is set to a double quote ("), make sure to set 
`trim_double_quotes` to true. |
+| escape                       | Specify the escape character. It is used to 
escape characters that are the same as the enclosure character within a field. 
For example, if the data is "a,'b,'c'", and the enclosure is "'", and you want 
"b,'c" to be parsed as a single field, you need to specify a single-byte escape 
character, such as "", and modify the data to "a,'b','c'". |
+
+### Load return value
+
+Stream Load is a synchronous import method, and the load result is directly 
provided to the user through the creation of an load return value, as shown 
below:
+
+```sql
+{
+    "TxnId": 1003,
+    "Label": "b6f3bc78-0d2c-45d9-9e4c-faa0a0149bee",
+    "Status": "Success",
+    "ExistingJobStatus": "FINISHED", // optional
+    "Message": "OK",
+    "NumberTotalRows": 1000000,
+    "NumberLoadedRows": 1000000,
+    "NumberFilteredRows": 1,
+    "NumberUnselectedRows": 0,
+    "LoadBytes": 40888898,
+    "LoadTimeMs": 2144,
+    "BeginTxnTimeMs": 1,
+    "StreamLoadPutTimeMs": 2,
+    "ReadDataTimeMs": 325,
+    "WriteDataTimeMs": 1933,
+    "CommitAndPublishTimeMs": 106,
+    "ErrorURL": 
"http://192.168.1.1:8042/api/_load_error_log?file=__shard_0/error_log_insert_stmt_db18266d4d9b4ee5-abb00ddd64bdf005_db18266d4d9b4ee5_abb00ddd64bdf005";
+}
+```
+
+The return result parameters are explained in the following table:
+
+| Parameters             | Parameters description                              
         |
+| ---------------------- | 
------------------------------------------------------------ |
+| TxnId                  | Import transaction ID                               
         |
+| Label                  | Label of load job，specified via `-H 
"label:<label_id>"`.    |
+| Status                 | Final load Status. **Success**:  The load job was 
successful.**Publish Timeout**: The load job has been completed, but there may 
be a delay in data visibility. **Label Already Exists**: The label is 
duplicated, requiring a new label. **Fail**: The load job failed. |
+| ExistingJobStatus      | The status of the load job corresponding to the 
already existing label. This field is only displayed when the Status is **Label 
Already Exists**. Users can use this status to know the status of the import 
job corresponding to the existing label. **RUNNING** means the job is still 
executing, and **FINISHED** means the job was successful. |
+| Message                | Error information related to the load job.          
         |
+| NumberTotalRows        | The total number of rows processed during the load 
job.      |
+| NumberLoadedRows       | The number of rows that were successfully loaded.   
         |
+| NumberFilteredRows     | The number of rows that did not meet the data 
quality standards. |
+| NumberUnselectedRows   | The number of rows that were filtered out based on 
the WHERE condition. |
+| LoadBytes              | The amount of data in bytes.                        
         |
+| LoadTimeMs             | The time taken for the load job to complete, 
measured in milliseconds. |
+| BeginTxnTimeMs         | The time taken to request the initiation of a 
transaction from the Frontend node (FE), measured in milliseconds. |
+| StreamLoadPutTimeMs    | The time taken to request the execution plan for 
the load job data from the FE, measured in milliseconds. |
+| ReadDataTimeMs         | The time spent reading the data during the load 
job, measured in milliseconds. |
+| WriteDataTimeMs        | The time taken to perform the data writing 
operations during the load job, measured in milliseconds. |
+| CommitAndPublishTimeMs | The time taken to request the commit and publish 
the transaction from the FE, measured in milliseconds. |
+| ErrorURL               | If there are data quality issues, users can access 
this URL to view the specific rows with errors. |
+
+Users can access the ErrorURL to review data that failed to import due to 
issues with data quality. By executing the command `curl "<ErrorURL>"`, users 
can directly retrieve information about the erroneous data.
+
+## Application of Table Value Function in Stream Load - http_stream Mode
+
+Leveraging the recently introduced functionality of Table Value Function (TVF) 
in Doris, Stream Load now allows the expression of import parameters through 
SQL statements. Specifically, a TVF named `http_stream` has been dedicated for 
Stream Load operations.
+
+:::tip
+
+When performing Stream Load using the TVF `http_stream`, the Rest API URL 
differs from the standard URL used for regular Stream Load imports.
+
+- Standard Stream Load URL:
+  `http://fe_host:http_port/api/{db}/{table}/_stream_load`
+- URL for Stream Load using TVF `http_stream`:
+  `http://fe_host:http_port/api/_http_stream`
+
+:::
+
+Using curl for Stream Load in http_stream Mode:
+
+```Bash
+curl --location-trusted -u user:passwd [-H "sql: ${load_sql}"...] -T data.file 
-XPUT http://fe_host:http_port/api/_http_stream
+```
+
+Adding a SQL parameter in the header to replace the previous parameters such 
as `column_separator`, `line_delimiter`, `where`, `columns`, etc., makes it 
very convenient to use.
+
+Example of load SQL:
+
+```Bash
+insert into db.table (col, ...) select stream_col, ... from 
http_stream("property1"="value1");
+```
+
+http_stream parameter:
+
+- "column_separator" = ","
+
+- "format" = "CSV"
+- ...
+
+For example:
+
+```Plain
+curl  --location-trusted -u root: -T test.csv  -H "sql:insert into 
demo.example_tbl_1(user_id, age, cost) select c1, c4, c7 * 2 from 
http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\" ) where age >= 
30"  http://127.0.0.1:28030/api/_http_stream
+```
+
+## Load example
+
+### Setting load timeout and maximum size
+
+The timeout for a load job is measured in seconds. If the load job is not 
completed within the specified timeout period, it will be cancelled by the 
system and marked as `CANCELLED`. You can adjust the timeout for a Stream Load 
job by specifying the `timeout` parameter or adding the 
`stream_load_default_timeout_second` parameter in the fe.conf file.
+
+Before initiating the load, you need to calculate the timeout based on the 
file size. For example, for a 100GB file with an estimated load performance of 
50MB/s:
+
+```
+Load time ≈ 100GB / 50MB/s ≈ 2048s
+```
+
+You can use the following command to specify a timeout of 3000 seconds for 
creating a Stream Load job:
+
+```Shell
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "timeout:3000"
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Setting maximum error tolerance rate 
+
+Load job can tolerate a certain amount of data with formatting errors. The 
tolerance rate is configured using the `max_filter_ratio` parameter. By 
default, it is set to 0, meaning that if there is even a single erroneous data 
row, the entire load job will fail. If users wish to ignore some problematic 
data rows, they can set this parameter to a value between 0 and 1. Doris will 
automatically skip rows with incorrect data formats. For more information on 
calculating the tolerance rate, pl [...]
+
+You can use the following command to specify a `max_filter_ratio` tolerance of 
0.4 for creating a Stream Load job:
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "max_filter_ratio:0.4" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Setting load filtering conditions
+
+During the load job, you can use the WHERE parameter to apply conditional 
filtering to the imported data. The filtered data will not be included in the 
calculation of the filter ratio and will not affect the setting of 
`max_filter_ratio`. After the load job is complete, you can view the number of 
filtered rows by checking `num_rows_unselected`.
+
+You can use the following command to specify WHERE filtering conditions for 
creating a Stream Load job:
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "where:age>=35" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading data into specific partitions
+
+Loading data from local files into partitions p1 and p2 of the table, allowing 
a 20% error rate.
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "label:123" \
+    -H "Expect:100-continue" \
+    -H "max_filter_ratio:0.2" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -H "partitions: p1, p2" \ 
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading data into specific timezone
+
+As Doris currently does not have a built-in time zone for time types, all 
DATETIME-related types represent absolute time points without timezone 
information and are not affected by changes in the Doris system timezone. 
Therefore, for importing data with timezones, our unified approach is to 
convert it to data in a specific target timezone. In the Doris system, this 
refers to the timezone represented by the session variable `time_zone`.
+
+During load job, our target timezone is specified through the parameter 
`timezone`. This variable overrides the session variable `time_zone` when 
performing timezone conversions or evaluating timezone-sensitive functions. 
Thus, unless there are special circumstances, the `timezone` should be set 
consistently with the current Doris cluster's `time_zone` during the import 
transaction. This means that all time data with timezones will be converted to 
this timezone. 
+
+For example, if the Doris system timezone is "+08:00" and the imported data 
contains two time entries: "2012-01-01 01:00:00" and "2015-12-12 
12:12:12-08:00", specifying the import transaction's timezone as "+08:00" via 
`-H "timezone: +08:00"` will result in both entries being converted to this 
timezone, yielding "2012-01-01 09:00:00" and "2015-12-13 04:12:12".
+
+### Streamingly import
+
+Stream Load is based on the HTTP protocol for importing, which supports using 
programming languages such as Java, Go, or Python for streaming import. This is 
why it is named Stream Load.
+
+The following example demonstrates this usage through a bash command pipeline. 
The imported data is generated streamingly by the program rather than from a 
local file.
+
+```Bash
+seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u 
root -T - http://host:port/api/testDb/testTbl/_stream_load
+```
+
+### Setting CSV first row filtering 
+
+File data:
+
+```Plain
+ id,name,age
+ 1,doris,20
+ 2,flink,10
+```
+
+Filtering  the first row during load by specifying ` format=csv_with_names`
+
+```Plain
+curl --location-trusted -u root -T test.csv  -H "label:1" -H 
"format:csv_with_names" -H "column_separator:," 
http://host:port/api/testDb/testTbl/_stream_load
+```
+
+### Specifying merge_type for DELETE operations
+
+In stream load, there are three import types: APPEND, DELETE, and MERGE. These 
can be adjusted by specifying the parameter `merge_type`. If you want to 
specify that all data with the same key as the imported data should be deleted, 
you can use the following command:
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "merge_type: DELETE" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+Before loading:
+
+```sql
++--------+----------+----------+------+
+| siteid | citycode | username | pv   |
++--------+----------+----------+------+
+|      3 |        2 | tom      |    2 |
+|      4 |        3 | bush     |    3 |
+|      5 |        3 | helen    |    3 |
++--------+----------+----------+------+
+```
+
+The imported data is:
+
+```sql
+3,2,tom,0
+```
+
+After importing, the original table data will be deleted, resulting in the 
following result:
+
+```sql
++--------+----------+----------+------+
+| siteid | citycode | username | pv   |
++--------+----------+----------+------+
+|      4 |        3 | bush     |    3 |
+|      5 |        3 | helen    |    3 |
++--------+----------+----------+------+
+```
+
+### Specifying merge_type for MERGE operation
+
+By specifying `merge_type` as MERGE, the imported data can be merged into the 
table. The MERGE semantics need to be used in combination with the DELETE 
condition, which means that data satisfying the DELETE condition is processed 
according to the DELETE semantics, and the rest is added to the table according 
to the APPEND semantics. The following operation represents deleting the row 
with `siteid` of 1, and adding the rest of the data to the table:
+
+```Bash
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "merge_type: MERGE" \
+    -H "delete: siteid=1" \
+    -H "column_separator:," \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+Before loading:
+
+```sql
++--------+----------+----------+------+
+| siteid | citycode | username | pv   |
++--------+----------+----------+------+
+|      4 |        3 | bush     |    3 |
+|      5 |        3 | helen    |    3 |
+|      1 |        1 | jim      |    2 |
++--------+----------+----------+------+
 ```
-                         ^      +
-                         |      |
-                         |      | 1A. User submit load to FE
-                         |      |
-                         |   +--v-----------+
-                         |   | FE           |
-5. Return result to user |   +--+-----------+
-                         |      |
-                         |      | 2. Redirect to BE
-                         |      |
-                         |   +--v-----------+
-                         +---+Coordinator BE| 1B. User submit load to BE
-                             +-+-----+----+-+
-                               |     |    |
-                         +-----+     |    +-----+
-                         |           |          | 3. Distrbute data
-                         |           |          |
-                       +-v-+       +-v-+      +-v-+
-                       |BE |       |BE |      |BE |
-                       +---+       +---+      +---+
+
+The imported data is:
+
+```sql
+2,1,grace,2
+3,2,tom,2
+1,1,jim,2
+```
+
+After loading, the row with `siteid = 1` will be deleted according to the 
condition, and the rows with `siteid` of 2 and 3 will be added to the table:
+
+```sql
++--------+----------+----------+------+
+| siteid | citycode | username | pv   |
++--------+----------+----------+------+
+|      4 |        3 | bush     |    3 |
+|      2 |        1 | grace    |    2 |
+|      3 |        2 | tom      |    2 |
+|      5 |        3 | helen    |    3 |
++--------+----------+----------+------+
 ```
 
-In Stream load, Doris selects a node as the Coordinator node. This node is 
responsible for receiving data and distributing data to other data nodes.
+### Specifying sequence column for merge 
+
+When a table with a Unique Key has a Sequence column, the value of the 
Sequence column serves as the basis for the replacement order in the REPLACE 
aggregation function under the same Key column. A larger value can replace a 
smaller one. When marking deletions based on `DORIS_DELETE_SIGN` for such a 
table, it is necessary to ensure that the Key is the same and that the Sequence 
column value is greater than or equal to the current value. By specifying the 
`function_column.sequence_col` pa [...]
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "merge_type: DELETE" \
+    -H "function_column.sequence_col: age" 
+    -H "column_separator:," \
+    -H "columns: name, gender, age" 
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+Given the following table schema:
+
+```sql
+mysql> SET show_hidden_columns=true;
+Query OK, 0 rows affected (0.00 sec)
+
+mysql> DESC table1;
++------------------------+--------------+------+-------+---------+---------+
+| Field                  | Type         | Null | Key   | Default | Extra   |
++------------------------+--------------+------+-------+---------+---------+
+| name                   | VARCHAR(100) | No   | true  | NULL    |         |
+| gender                 | VARCHAR(10)  | Yes  | false | NULL    | REPLACE |
+| age                    | INT          | Yes  | false | NULL    | REPLACE |
+| __DORIS_DELETE_SIGN__  | TINYINT      | No   | false | 0       | REPLACE |
+| __DORIS_SEQUENCE_COL__ | INT          | Yes  | false | NULL    | REPLACE |
++------------------------+--------------+------+-------+---------+---------+
+4 rows in set (0.00 sec)
+```
+
+The original table data is:
+
+```SQL
++-------+--------+------+
+| name  | gender | age  |
++-------+--------+------+
+| li    | male   |   10 |
+| wang  | male   |   14 |
+| zhang | male   |   12 |
++-------+--------+------+
+```
+
+1. Sequence parameter takes Eeffect, loading sequence column value is larger 
than or equal to the existing data in the table.
+
+   loading data as:
+
+```SQL
+li,male,10
+```
+
+Since `function_column.sequence_col` is specified as `age`, and the `age` 
value is larger than or equal to the existing column in the table, the original 
table data is deleted. The table data becomes:
+
+```SQL
++-------+--------+------+
+| name  | gender | age  |
++-------+--------+------+
+| wang  | male   |   14 |
+| zhang | male   |   12 |
++-------+--------+------+
+```
+
+2. Sequence parameter do not take effect, loading sequence column value is 
less than or equal to the existing data in the table:
+
+   loading data as:
+
+```SQL
+li,male,9
+```
+
+Since `function_column.sequence_col` is specified as `age`, but the `age` 
value is less than the existing column in the table, the delete operation does 
not take effect. The table data remains unchanged, and the row with the primary 
key of `li` is still visible:
+
+```sql
++-------+--------+------+
+| name  | gender | age  |
++-------+--------+------+
+| li    | male   |   10 |
+| wang  | male   |   14 |
+| zhang | male   |   12 |
++-------+--------+------+
+```
+
+It is not deleted because that, at the underlying dependency level, it first 
checks for rows with the same key. It displays the row data with the larger 
sequence column value. Then, it checks the `DORIS_DELETE_SIGN` value for that 
row. If it is 1, it is not displayed externally. If it is 0, it is still read 
and displayed.
+
+### Loading data with enclosing characters
+
+When the data in a CSV file contains delimiters or separators, single-byte 
characters can be specified as enclosing characters to protect the data from 
being truncated.
+
+For example, in the following data where a comma is used as the separator but 
also exists within a field:
+
+```sql
+zhangsan,30,'Shanghai, HuangPu District, Dagu Road'
+```
+
+By specifying an enclosing character such as a single quotation mark ', the 
entire `Shanghai, HuangPu District, Dagu Road` can be treated as a single field.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "column_separator:," \
+    -H "enclose:'" \
+    -H "escape:\" \
+    -H "columns:username,age,address" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+If the enclosing character also appears within a field, such as wanting to 
treat `Shanghai City, Huangpu District, \'Dagu Road` as a single field, it is 
necessary to first perform string escaping within the column:
+
+```
+Zhang San,30,'Shanghai, Huangpu District, \'Dagu Road'
+```
+
+An escape character, which is a single-byte character, can be specified using 
the escape parameter. In the example, the backslash `\` is used as the escape 
character.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "column_separator:," \
+    -H "enclose:'" \
+    -H "columns:username,age,address" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading fields containing default CURRENT_TIMESTAMP type
+
+Here's an example of loading data into a table that contains a field with the 
DEFAULT CURRENT_TIMESTAMP type:
+
+Table schema:
+
+```sql
+`id` bigint(30) NOT NULL,
+`order_code` varchar(30) DEFAULT NULL COMMENT '',
+`create_time` datetimev2(3) DEFAULT CURRENT_TIMESTAMP
+```
+
+JSON data type:
+
+```Plain
+{"id":1,"order_Code":"avc"}
+```
+
+Command:
+
+```Bash
+curl --location-trusted -u root -T test.json -H "label:1" -H "format:json" -H 
'columns: id, order_code, create_time=CURRENT_TIMESTAMP()' 
http://host:port/api/testDb/testTbl/_stream_load
+```
+
+### Simple mode for loading JSON format data
+
+When the JSON fields correspond one-to-one with the column names in the table, 
you can import JSON data format into the table by specifying the parameters 
"strip_outer_array:true" and "format:json".
+
+For example, if the table is defined as follows:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    user_id            BIGINT       NOT NULL COMMENT "用户 ID",
+    name               VARCHAR(20)           COMMENT "用户姓名",
+    age                INT                   COMMENT "用户年龄"
+)
+DUPLICATE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+And the data field names correspond one-to-one with the column names in the 
table:
+
+```sql
+[
+{"user_id":1,"name":"Emily","age":25},
+{"user_id":2,"name":"Benjamin","age":35},
+{"user_id":3,"name":"Olivia","age":28},
+{"user_id":4,"name":"Alexander","age":60},
+{"user_id":5,"name":"Ava","age":17},
+{"user_id":6,"name":"William","age":69},
+{"user_id":7,"name":"Sophia","age":32},
+{"user_id":8,"name":"James","age":64},
+{"user_id":9,"name":"Emma","age":37},
+{"user_id":10,"name":"Liam","age":64}
+]
+```
+
+You can use the following command to load JSON data into the table:
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "format:json" \
+    -H "strip_outer_array:true" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Matching mode for loading complex JSON format data
+
+When the JSON data is more complex and cannot correspond one-to-one with the 
column names in the table, or there are extra columns, you can use the 
jsonpaths parameter to complete the column name mapping and perform data 
matching import. For example, with the following data:
+
+```sql
+[
+{"userid":1,"hudi":"lala","username":"Emily","userage":25,"userhp":101},
+{"userid":2,"hudi":"kpkp","username":"Benjamin","userage":35,"userhp":102},
+{"userid":3,"hudi":"ji","username":"Olivia","userage":28,"userhp":103},
+{"userid":4,"hudi":"popo","username":"Alexander","userage":60,"userhp":103},
+{"userid":5,"hudi":"uio","username":"Ava","userage":17,"userhp":104},
+{"userid":6,"hudi":"lkj","username":"William","userage":69,"userhp":105},
+{"userid":7,"hudi":"komf","username":"Sophia","userage":32,"userhp":106},
+{"userid":8,"hudi":"mki","username":"James","userage":64,"userhp":107},
+{"userid":9,"hudi":"hjk","username":"Emma","userage":37,"userhp":108},
+{"userid":10,"hudi":"hua","username":"Liam","userage":64,"userhp":109}
+]
+```
+
+You can specify the jsonpaths parameter to match the specified columns:
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "format:json" \
+    -H "strip_outer_array:true" \
+    -H "jsonpaths:[\"$.userid\", \"$.username\", \"$.userage\"]" \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Specifying JSON root node for data load
+
+If the JSON data contains nested JSON fields, you need to specify the root 
node of the imported JSON. The default value is "".
+
+For example, with the following data, if you want to import the data in the 
comment column into the table:
+
+```sql
+[
+    {"user":1,"comment":{"userid":101,"username":"Emily","userage":25}},
+    {"user":2,"comment":{"userid":102,"username":"Benjamin","userage":35}},
+    {"user":3,"comment":{"userid":103,"username":"Olivia","userage":28}},
+    {"user":4,"comment":{"userid":104,"username":"Alexander","userage":60}},
+    {"user":5,"comment":{"userid":105,"username":"Ava","userage":17}},
+    {"user":6,"comment":{"userid":106,"username":"William","userage":69}},
+    {"user":7,"comment":{"userid":107,"username":"Sophia","userage":32}},
+    {"user":8,"comment":{"userid":108,"username":"James","userage":64}},
+    {"user":9,"comment":{"userid":109,"username":"Emma","userage":37}},
+    {"user":10,"comment":{"userid":110,"username":"Liam","userage":64}}
+    ]
+```
+
+First, you need to specify the root node as comment using the json_root 
parameter, and then complete the column name mapping according to the jsonpaths 
parameter.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "format:json" \
+    -H "strip_outer_array:true" \
+    -H "json_root: $.comment" \
+    -H "jsonpaths:[\"$.userid\", \"$.username\", \"$.userage\"]" \
+    -H "columns:user_id,name,age" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading array data type
+
+For example, if the following data contains an array type:
+
+```sql
+1|Emily|[1,2,3,4]
+2|Benjamin|[22,45,90,12]
+3|Olivia|[23,16,19,16]
+4|Alexander|[123,234,456]
+5|Ava|[12,15,789]
+6|William|[57,68,97]
+7|Sophia|[46,47,49]
+8|James|[110,127,128]
+9|Emma|[19,18,123,446]
+10|Liam|[89,87,96,12]
+```
+
+Load data into the following table structure:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    typ_id     BIGINT          NOT NULL COMMENT "ID",
+    name       VARCHAR(20)     NULL     COMMENT "名称",
+    arr        ARRAY<int(10)>  NULL     COMMENT "数组"
+)
+DUPLICATE KEY(typ_id)
+DISTRIBUTED BY HASH(typ_id) BUCKETS 10;
+```
+
+You can directly load the ARRAY type from a text file into the table using a 
Stream Load job.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "column_separator:|" \
+    -H "columns:typ_id,name,arr" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading map data type
+
+When the imported data contains a map type, as in the following example:
+
+```SQL
+[
+{"user_id":1,"namemap":{"Emily":101,"age":25}},
+{"user_id":2,"namemap":{"Benjamin":102,"age":35}},
+{"user_id":3,"namemap":{"Olivia":103,"age":28}},
+{"user_id":4,"namemap":{"Alexander":104,"age":60}},
+{"user_id":5,"namemap":{"Ava":105,"age":17}},
+{"user_id":6,"namemap":{"William":106,"age":69}},
+{"user_id":7,"namemap":{"Sophia":107,"age":32}},
+{"user_id":8,"namemap":{"James":108,"age":64}},
+{"user_id":9,"namemap":{"Emma":109,"age":37}},
+{"user_id":10,"namemap":{"Liam":110,"age":64}}
+]
+```
+
+Load data into the following table structure:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    user_id            BIGINT       NOT NULL COMMENT "ID",
+    namemap            Map<STRING, INT>  NULL     COMMENT "名称"
+)
+DUPLICATE KEY(user_id)
+DISTRIBUTED BY HASH(user_id) BUCKETS 10;
+```
+
+You can directly load the map type from a text file into the table using a 
Stream Load task.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "format: json" \
+    -H "strip_outer_array:true" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading bitmap data type
+
+During the import process, when encountering Bitmap type data, you can use 
to_bitmap to convert the data into Bitmap, or use the bitmap_empty function to 
fill the Bitmap.
+
+For example, with the following data:
+
+```SQL
+1|koga|17723
+2|nijg|146285
+3|lojn|347890
+4|lofn|489871
+5|jfin|545679
+6|kon|676724
+7|nhga|767689
+8|nfubg|879878
+9|huang|969798
+10|buag|97997
+```
+
+Load data into the following table containing the Bitmap type:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    typ_id     BIGINT                NULL   COMMENT "ID",
+    hou        VARCHAR(10)           NULL   COMMENT "one",
+    arr        BITMAP  BITMAP_UNION  NULL   COMMENT "two"
+)
+AGGREGATE KEY(typ_id,hou)
+DISTRIBUTED BY HASH(typ_id,hou) BUCKETS 10;
+```
+
+And use to_bitmap to convert the data into the Bitmap type.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "Expect:100-continue" \
+    -H "columns:typ_id,hou,arr,arr=to_bitmap(arr)"
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Loading HyperLogLog data type
+
+You can use the hll_hash function to convert data into the hll type, as in the 
following example:
+
+```SQL
+1001|koga
+1002|nijg
+1003|lojn
+1004|lofn
+1005|jfin
+1006|kon
+1007|nhga
+1008|nfubg
+1009|huang
+1010|buag
+```
+
+Load data into the following table:
+
+```sql
+CREATE TABLE testdb.test_streamload(
+    typ_id           BIGINT          NULL   COMMENT "ID",
+    typ_name         VARCHAR(10)     NULL   COMMENT "NAME",
+    pv               hll hll_union   NULL   COMMENT "hll"
+)
+AGGREGATE KEY(typ_id,typ_name)
+DISTRIBUTED BY HASH(typ_id) BUCKETS 10;
+```
+
+And use the hll_hash command for import.
+
+```sql
+curl --location-trusted -u <doris_user>:<doris_password> \
+    -H "column_separator:|" \
+    -H "columns:typ_id,typ_name,pv=hll_hash(typ_id)" \
+    -T streamload_example.csv \
+    -XPUT http://<fe_ip>:<fe_http_port>/api/testdb/test_streamload/_stream_load
+```
+
+### Label, loading transaction, multi-table atomicity
+
+All load jobs in Doris are atomically effective. And multiple tables loading 
in the same load job can also guarantee atomicity. At the same time, Doris can 
also use the Label mechanism to ensure that data loading is not lost or 
duplicated. For specific instructions, please refer to the [Import Transactions 
and Atomicity](../../data-operate/import/load-atomicity) documentation.
+
+### Column mapping, derived columns, and filtering
+
+Doris supports a very rich set of column transformations and filtering 
operations in load statements. Supports most built-in functions and UDFs. For 
how to use this feature correctly, please refer to the [Data 
Transformation](../../data-operate/import/load-data-convert) documentation.
+
+### Enable strict mode import
+
+The strict_mode attribute is used to set whether the import task runs in 
strict mode. This attribute affects the results of column mapping, 
transformation, and filtering, and it also controls the behavior of partial 
column updates. For specific instructions on strict mode, please refer to the 
[Strict Mode](../../data-operate/import/load-strict-mode) documentation.
+
+### Perform partial column updates during import
+
+For how to express partial column updates during import, please refer to the 
Data Manipulation/Data Update documentation.
+
+## More help
+
+For more detailed syntax and best practices on using Stream Load, please refer 
to the [Stream 
Load](../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD)
 Command Manual. You can also enter HELP STREAM LOAD in the MySql client 
command line to get more help information.
+
+
+
+
+
+
 
-Users submit import commands through HTTP protocol. If submitted to FE, FE 
forwards the request to a BE via the HTTP redirect instruction. Users can also 
submit import commands directly to a specified BE.
 
-The final result of the import is returned to the user by Coordinator BE.
 
-## Support data format
 
-Stream Load currently supports data formats: CSV (text), JSON
 
-<version since="1.2">supports PARQUET and ORC</version>
 
-## Basic operations
-### Create Load
 
 Stream load submits and transfers data through HTTP protocol. Here, the `curl` 
command shows how to submit an import.
 
@@ -168,10 +1160,10 @@ Stream load uses HTTP protocol, so all parameters 
related to import tasks are se
     Examples of column order transformation: There are three columns of 
original data (src_c1,src_c2,src_c3), and there are also three columns 
（dst_c1,dst_c2,dst_c3) in the doris table at present.
     when the first column src_c1 of the original file corresponds to the 
dst_c1 column of the target table, while the second column src_c2 of the 
original file corresponds to the dst_c2 column of the target table and the 
third column src_c3 of the original file corresponds to the dst_c3 column of 
the target table,which is written as follows:
     columns: dst_c1, dst_c2, dst_c3
-
+  
     when the first column src_c1 of the original file corresponds to the 
dst_c2 column of the target table, while the second column src_c2 of the 
original file corresponds to the dst_c3 column of the target table and the 
third column src_c3 of the original file corresponds to the dst_c1 column of 
the target table,which is written as follows:
     columns: dst_c2, dst_c3, dst_c1
-
+  
     Example of expression transformation: There are two columns in the 
original file and two columns in the target table (c1, c2). However, both 
columns in the original file need to be transformed by functions to correspond 
to the two columns in the target table.
     columns: tmp_c1, tmp_c2, c1 = year(tmp_c1), c2 = mouth(tmp_c2)
     Tmp_* is a placeholder, representing two original columns in the original 
file.
@@ -229,9 +1221,9 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
   }
   ```
     2. Trigger the commit operation on the transaction.
-    Note 1) requesting to fe and be both works
-    Note 2) `{table}` in url can be omit when commit
-  using txn id
+      Note 1) requesting to fe and be both works
+      Note 2) `{table}` in url can be omit when commit
+      using txn id
   ```shell
   curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18036" -H 
"txn_operation:commit"  
http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
@@ -248,9 +1240,9 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
   }
   ```
     3. Trigger an abort operation on a transaction
-    Note 1) requesting to fe and be both works
-    Note 2) `{table}` in url can be omit when abort
-  using txn id
+      Note 1) requesting to fe and be both works
+      Note 2) `{table}` in url can be omit when abort
+      using txn id
   ```shell
   curl -X PUT --location-trusted -u user:passwd  -H "txn_id:18037" -H 
"txn_operation:abort"  
http://fe_host:http_port/api/{db}/{table}/_stream_load_2pc
   {
@@ -514,7 +1506,7 @@ Doris provides StreamLoad examples in three languages: 
[Java](https://github.com
           <version>4.5.13</version>
         </dependency>
     ```
- 
+
 * After enabling the Stream Load record on the BE, the record cannot be queried
 
   This is caused by the slowness of fetching records, you can try to adjust 
the following parameters:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [docs]update en docs of stream load (#486)

Reply via email to