This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 2b7d60eb4d [fix](docs) fix stream load docs (#23472)
2b7d60eb4d is described below
commit 2b7d60eb4daba54066712f9c8333e06daccbf3b0
Author: Kaijie Chen <[email protected]>
AuthorDate: Fri Aug 25 19:28:40 2023 +0800
[fix](docs) fix stream load docs (#23472)
---
.../import/import-way/stream-load-manual.md | 94 +++++++++++-----------
.../import/import-way/stream-load-manual.md | 8 +-
2 files changed, 51 insertions(+), 51 deletions(-)
diff --git a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
index bbec22ecb6..530cadf317 100644
--- a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
@@ -94,7 +94,7 @@ The detailed syntax for creating imports helps to execute
``HELP STREAM LOAD`` v
+ user/passwd
- Stream load uses the HTTP protocol to create the imported protocol and
signs it through the Basic Access authentication. The Doris system verifies
user identity and import permissions based on signatures.
+ Stream load uses the HTTP protocol to create the imported protocol and signs
it through the Basic Access authentication. The Doris system verifies user
identity and import permissions based on signatures.
**Load Parameters**
@@ -102,11 +102,11 @@ Stream load uses HTTP protocol, so all parameters related
to import tasks are se
+ label
- Identity of import task. Each import task has a unique label inside a
single database. Label is a user-defined name in the import command. With this
label, users can view the execution of the corresponding import task.
+ Identity of import task. Each import task has a unique label inside a single
database. Label is a user-defined name in the import command. With this label,
users can view the execution of the corresponding import task.
- Another function of label is to prevent users from importing the same
data repeatedly. **It is strongly recommended that users use the same label for
the same batch of data. This way, repeated requests for the same batch of data
will only be accepted once, guaranteeing at-Most-Once**
+ Another function of label is to prevent users from importing the same data
repeatedly. **It is strongly recommended that users use the same label for the
same batch of data. This way, repeated requests for the same batch of data will
only be accepted once, guaranteeing at-Most-Once**
- When the corresponding import operation state of label is CANCELLED,
the label can be used again.
+ When the corresponding import operation state of label is CANCELLED, the
label can be used again.
+ column_separator
@@ -125,17 +125,17 @@ Stream load uses HTTP protocol, so all parameters related
to import tasks are se
+ max\_filter\_ratio
- The maximum tolerance rate of the import task is 0 by default, and the
range of values is 0-1. When the import error rate exceeds this value, the
import fails.
+ The maximum tolerance rate of the import task is 0 by default, and the range
of values is 0-1. When the import error rate exceeds this value, the import
fails.
- If the user wishes to ignore the wrong row, the import can be
successful by setting this parameter greater than 0.
+ If the user wishes to ignore the wrong row, the import can be successful by
setting this parameter greater than 0.
- The calculation formula is as follows:
+ The calculation formula is as follows:
``` (dpp.abnorm.ALL / (dpp.abnorm.ALL + dpp.norm.ALL ) ) >
max_filter_ratio ```
- ``` dpp.abnorm.ALL``` denotes the number of rows whose data quality is
not up to standard. Such as type mismatch, column mismatch, length mismatch and
so on.
+ ``` dpp.abnorm.ALL``` denotes the number of rows whose data quality is not
up to standard. Such as type mismatch, column mismatch, length mismatch and so
on.
- ``` dpp.norm.ALL ``` refers to the number of correct data in the import
process. The correct amount of data for the import task can be queried by the
``SHOW LOAD` command.
+ ``` dpp.norm.ALL ``` refers to the number of correct data in the import
process. The correct amount of data for the import task can be queried by the
``SHOW LOAD` command.
The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
@@ -182,10 +182,6 @@ Stream load uses HTTP protocol, so all parameters related
to import tasks are se
Stream load import can enable two-stage transaction commit mode: in the
stream load process, the data is written and the information is returned to the
user. At this time, the data is invisible and the transaction status is
`PRECOMMITTED`. After the user manually triggers the commit operation, the data
is visible.
-+ enable_profile
-
- <version since="1.2.7">When `enable_profile` is true, the Stream Load
profile will be printed to logs (be.INFO).</version>
-
Example:
1. Initiate a stream load pre-commit operation
@@ -231,6 +227,10 @@ Stream load uses HTTP protocol, so all parameters related
to import tasks are se
}
```
++ enable_profile
+
+ <version since="1.2.7">When `enable_profile` is true, the Stream Load
profile will be printed to logs (be.INFO).</version>
+
### Use stream load with SQL
You can add a `sql` parameter to the `Header` to replace the
`column_separator`, `line_delimiter`, `where`, `columns` in the previous
parameter, which is convenient to use.
@@ -295,13 +295,13 @@ The following main explanations are given for the Stream
load import result para
+ Status: Import completion status.
- "Success": Indicates successful import.
+ "Success": Indicates successful import.
- "Publish Timeout": This state also indicates that the import has been
completed, except that the data may be delayed and visible without retrying.
+ "Publish Timeout": This state also indicates that the import has been
completed, except that the data may be delayed and visible without retrying.
- "Label Already Exists": Label duplicate, need to be replaced Label.
+ "Label Already Exists": Label duplicate, need to be replaced Label.
- "Fail": Import failed.
+ "Fail": Import failed.
+ ExistingJobStatus: The state of the load job corresponding to the existing
Label.
@@ -351,15 +351,15 @@ By default, BE does not record Stream Load records. If
you want to view records
+ stream\_load\_default\_timeout\_second
- The timeout time of the import task (in seconds) will be cancelled by
the system if the import task is not completed within the set timeout time, and
will become CANCELLED.
+ The timeout time of the import task (in seconds) will be cancelled by the
system if the import task is not completed within the set timeout time, and
will become CANCELLED.
- At present, Stream load does not support custom import timeout time.
All Stream load import timeout time is uniform. The default timeout time is 600
seconds. If the imported source file can no longer complete the import within
the specified time, the FE parameter ```stream_load_default_timeout_second```
needs to be adjusted.
+ At present, Stream load does not support custom import timeout time. All
Stream load import timeout time is uniform. The default timeout time is 600
seconds. If the imported source file can no longer complete the import within
the specified time, the FE parameter ```stream_load_default_timeout_second```
needs to be adjusted.
### BE configuration
+ streaming\_load\_max\_mb
- The maximum import size of Stream load is 10G by default, in MB. If the
user's original file exceeds this value, the BE parameter
```streaming_load_max_mb``` needs to be adjusted.
+ The maximum import size of Stream load is 10G by default, in MB. If the
user's original file exceeds this value, the BE parameter
```streaming_load_max_mb``` needs to be adjusted.
## Best Practices
@@ -391,17 +391,17 @@ Cluster situation: The concurrency of Stream load is not
affected by cluster siz
+ Step 1: Does the import file size exceed the default maximum import size of
10G
- ```
- BE conf
- streaming_load_max_mb = 16000
- ```
+ ```
+ BE conf
+ streaming_load_max_mb = 16000
+ ```
+ Step 2: Calculate whether the approximate import time exceeds the default
timeout value
- ```
- Import time 15000/10 = 1500s
- Over the default timeout time, you need to modify the FE configuration
- stream_load_default_timeout_second = 1500
- ```
+ ```
+ Import time 15000/10 = 1500s
+ Over the default timeout time, you need to modify the FE configuration
+ stream_load_default_timeout_second = 1500
+ ```
+ Step 3: Create Import Tasks
@@ -429,35 +429,35 @@ Doris provides StreamLoad examples in three languages:
[Java](https://github.com
* Label Already Exists
- The Label repeat checking steps of Stream load are as follows:
+ The Label repeat checking steps of Stream load are as follows:
- 1. Is there an import Label conflict that already exists with other
import methods?
+ 1. Is there an import Label conflict that already exists with other import
methods?
- Because imported Label in Doris system does not distinguish
between import methods, there is a problem that other import methods use the
same Label.
+ Because imported Label in Doris system does not distinguish between import
methods, there is a problem that other import methods use the same Label.
- Through ``SHOW LOAD WHERE LABEL = "xxx"'``, where XXX is a
duplicate Label string, see if there is already a Label imported by FINISHED
that is the same as the Label created by the user.
+ Through ``SHOW LOAD WHERE LABEL = "xxx"'``, where XXX is a duplicate Label
string, see if there is already a Label imported by FINISHED that is the same
as the Label created by the user.
- 2. Are Stream loads submitted repeatedly for the same job?
+ 2. Are Stream loads submitted repeatedly for the same job?
- Since Stream load is an HTTP protocol submission creation
import task, HTTP Clients in various languages usually have their own request
retry logic. After receiving the first request, the Doris system has started to
operate Stream load, but because the result is not returned to the Client side
in time, the Client side will retry to create the request. At this point, the
Doris system is already operating on the first request, so the second request
will be reported to Label Already Exists.
+ Since Stream load is an HTTP protocol submission creation import task,
HTTP Clients in various languages usually have their own request retry logic.
After receiving the first request, the Doris system has started to operate
Stream load, but because the result is not returned to the Client side in time,
the Client side will retry to create the request. At this point, the Doris
system is already operating on the first request, so the second request will be
reported to Label Already Exists.
- To sort out the possible methods mentioned above: Search FE
Master's log with Label to see if there are two ``redirect load action to
destination = ``redirect load action to destination cases in the same Label. If
so, the request is submitted repeatedly by the Client side.
+ To sort out the possible methods mentioned above: Search FE Master's log
with Label to see if there are two ``redirect load action to destination =
``redirect load action to destination cases in the same Label. If so, the
request is submitted repeatedly by the Client side.
- It is recommended that the user calculate the approximate
import time based on the amount of data currently requested, and change the
request overtime on the client side to a value greater than the import timeout
time according to the import timeout time to avoid multiple submissions of the
request by the client side.
+ It is recommended that the user calculate the approximate import time
based on the amount of data currently requested, and change the request
overtime on the client side to a value greater than the import timeout time
according to the import timeout time to avoid multiple submissions of the
request by the client side.
- 3. Connection reset abnormal
+ 3. Connection reset abnormal
- In the community version 0.14.0 and earlier versions, the connection
reset exception occurred after Http V2 was enabled, because the built-in web
container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is
a problem with the implementation of this protocol. All In the case of using
Stream load to import a large amount of data, a connect reset exception will
occur. This is because tomcat started data transmission before the 307 jump,
which resulted in the lack of aut [...]
+ In the community version 0.14.0 and earlier versions, the connection reset
exception occurred after Http V2 was enabled, because the built-in web
container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is
a problem with the implementation of this protocol. All In the case of using
Stream load to import a large amount of data, a connect reset exception will
occur. This is because tomcat started data transmission before the 307 jump,
which resulted in the lack of au [...]
- After the upgrade, also upgrade the http client version of your
program to `4.5.13`,Introduce the following dependencies in your pom.xml file
+ After the upgrade, also upgrade the http client version of your program to
`4.5.13`,Introduce the following dependencies in your pom.xml file
- ```xml
- <dependency>
- <groupId>org.apache.httpcomponents</groupId>
- <artifactId>httpclient</artifactId>
- <version>4.5.13</version>
- </dependency>
- ```
+ ```xml
+ <dependency>
+ <groupId>org.apache.httpcomponents</groupId>
+ <artifactId>httpclient</artifactId>
+ <version>4.5.13</version>
+ </dependency>
+ ```
* After enabling the Stream Load record on the BE, the record cannot be queried
diff --git
a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
index 7153c97978..7e5d6a4dbf 100644
--- a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
@@ -193,10 +193,6 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的
Stream load 导入可以开启两阶段事务提交模式:在Stream
load过程中,数据写入完成即会返回信息给用户,此时数据不可见,事务状态为`PRECOMMITTED`,用户手动触发commit操作之后,数据才可见。
-- enable_profile
-
- <version since="1.2.7">当 `enable_profile` 为 true 时,Stream Load profile
将会被打印到 be.INFO 日志中。</version>
-
示例:
1. 发起stream load预提交操作
@@ -242,6 +238,10 @@ Stream Load 由于使用的是 HTTP 协议,所以所有导入任务有关的
}
```
+- enable_profile
+
+ <version since="1.2.7">当 `enable_profile` 为 true 时,Stream Load profile
将会被打印到 be.INFO 日志中。</version>
+
### 使用SQL表达Stream Load的参数
可以在Header中添加一个`sql`的参数,去替代之前参数中的`column_separator`、`line_delimiter`、`where`、`columns`参数,方便使用。
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]