[doris] branch master updated: [fix](docs) fix stream load docs (#23472)

dataroaring Fri, 25 Aug 2023 04:28:54 -0700

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 2b7d60eb4d [fix](docs) fix stream load docs (#23472)
2b7d60eb4d is described below

commit 2b7d60eb4daba54066712f9c8333e06daccbf3b0
Author: Kaijie Chen <[email protected]>
AuthorDate: Fri Aug 25 19:28:40 2023 +0800

    [fix](docs) fix stream load docs (#23472)
---
 .../import/import-way/stream-load-manual.md        | 94 +++++++++++-----------
 .../import/import-way/stream-load-manual.md        |  8 +-
 2 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md 
b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
index bbec22ecb6..530cadf317 100644
--- a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
@@ -94,7 +94,7 @@ The detailed syntax for creating imports helps to execute 
``HELP STREAM LOAD`` v
 
 + user/passwd
 
-       Stream load uses the HTTP protocol to create the imported protocol and 
signs it through the Basic Access authentication. The Doris system verifies 
user identity and import permissions based on signatures.
+  Stream load uses the HTTP protocol to create the imported protocol and signs 
it through the Basic Access authentication. The Doris system verifies user 
identity and import permissions based on signatures.
 
 **Load Parameters**
 
@@ -102,11 +102,11 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
 
 + label
 
-       Identity of import task. Each import task has a unique label inside a 
single database. Label is a user-defined name in the import command. With this 
label, users can view the execution of the corresponding import task.
+  Identity of import task. Each import task has a unique label inside a single 
database. Label is a user-defined name in the import command. With this label, 
users can view the execution of the corresponding import task.
 
-       Another function of label is to prevent users from importing the same 
data repeatedly. **It is strongly recommended that users use the same label for 
the same batch of data. This way, repeated requests for the same batch of data 
will only be accepted once, guaranteeing at-Most-Once**
+  Another function of label is to prevent users from importing the same data 
repeatedly. **It is strongly recommended that users use the same label for the 
same batch of data. This way, repeated requests for the same batch of data will 
only be accepted once, guaranteeing at-Most-Once**
 
-       When the corresponding import operation state of label is CANCELLED, 
the label can be used again.
+  When the corresponding import operation state of label is CANCELLED, the 
label can be used again.
 
 
 + column_separator
@@ -125,17 +125,17 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
 
 + max\_filter\_ratio
 
-       The maximum tolerance rate of the import task is 0 by default, and the 
range of values is 0-1. When the import error rate exceeds this value, the 
import fails.
+  The maximum tolerance rate of the import task is 0 by default, and the range 
of values is 0-1. When the import error rate exceeds this value, the import 
fails.
 
-       If the user wishes to ignore the wrong row, the import can be 
successful by setting this parameter greater than 0.
+  If the user wishes to ignore the wrong row, the import can be successful by 
setting this parameter greater than 0.
 
-       The calculation formula is as follows:
+  The calculation formula is as follows:
 
     ``` (dpp.abnorm.ALL / (dpp.abnorm.ALL + dpp.norm.ALL ) ) > 
max_filter_ratio ```
 
-       ``` dpp.abnorm.ALL``` denotes the number of rows whose data quality is 
not up to standard. Such as type mismatch, column mismatch, length mismatch and 
so on.
+  ``` dpp.abnorm.ALL``` denotes the number of rows whose data quality is not 
up to standard. Such as type mismatch, column mismatch, length mismatch and so 
on.
 
-       ``` dpp.norm.ALL ``` refers to the number of correct data in the import 
process. The correct amount of data for the import task can be queried by the 
``SHOW LOAD` command.
+  ``` dpp.norm.ALL ``` refers to the number of correct data in the import 
process. The correct amount of data for the import task can be queried by the 
``SHOW LOAD` command.
 
   The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
 
@@ -182,10 +182,6 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
 
   Stream load import can enable two-stage transaction commit mode: in the 
stream load process, the data is written and the information is returned to the 
user. At this time, the data is invisible and the transaction status is 
`PRECOMMITTED`. After the user manually triggers the commit operation, the data 
is visible.
 
-+ enable_profile
-
-  <version since="1.2.7">When `enable_profile` is true, the Stream Load 
profile will be printed to logs (be.INFO).</version>
-
   Example：
 
     1. Initiate a stream load pre-commit operation
@@ -231,6 +227,10 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
   }
   ```
 
++ enable_profile
+
+  <version since="1.2.7">When `enable_profile` is true, the Stream Load 
profile will be printed to logs (be.INFO).</version>
+
 ### Use stream load with SQL
 
 You can add a `sql` parameter to the `Header` to replace the 
`column_separator`, `line_delimiter`, `where`, `columns` in the previous 
parameter, which is convenient to use.
@@ -295,13 +295,13 @@ The following main explanations are given for the Stream 
load import result para
 
 + Status: Import completion status.
 
-       "Success": Indicates successful import.
+  "Success": Indicates successful import.
 
-       "Publish Timeout": This state also indicates that the import has been 
completed, except that the data may be delayed and visible without retrying.
+  "Publish Timeout": This state also indicates that the import has been 
completed, except that the data may be delayed and visible without retrying.
 
-       "Label Already Exists": Label duplicate, need to be replaced Label.
+  "Label Already Exists": Label duplicate, need to be replaced Label.
 
-       "Fail": Import failed.
+  "Fail": Import failed.
 
 + ExistingJobStatus: The state of the load job corresponding to the existing 
Label.
 
@@ -351,15 +351,15 @@ By default, BE does not record Stream Load records. If 
you want to view records
 
 + stream\_load\_default\_timeout\_second
 
-       The timeout time of the import task (in seconds) will be cancelled by 
the system if the import task is not completed within the set timeout time, and 
will become CANCELLED.
+  The timeout time of the import task (in seconds) will be cancelled by the 
system if the import task is not completed within the set timeout time, and 
will become CANCELLED.
 
-       At present, Stream load does not support custom import timeout time. 
All Stream load import timeout time is uniform. The default timeout time is 600 
seconds. If the imported source file can no longer complete the import within 
the specified time, the FE parameter ```stream_load_default_timeout_second``` 
needs to be adjusted.
+  At present, Stream load does not support custom import timeout time. All 
Stream load import timeout time is uniform. The default timeout time is 600 
seconds. If the imported source file can no longer complete the import within 
the specified time, the FE parameter ```stream_load_default_timeout_second``` 
needs to be adjusted.
 
 ### BE configuration
 
 + streaming\_load\_max\_mb
 
-       The maximum import size of Stream load is 10G by default, in MB. If the 
user's original file exceeds this value, the BE parameter 
```streaming_load_max_mb``` needs to be adjusted.
+  The maximum import size of Stream load is 10G by default, in MB. If the 
user's original file exceeds this value, the BE parameter 
```streaming_load_max_mb``` needs to be adjusted.
 
 ## Best Practices
 
@@ -391,17 +391,17 @@ Cluster situation: The concurrency of Stream load is not 
affected by cluster siz
 
 + Step 1: Does the import file size exceed the default maximum import size of 
10G
 
-       ```
-       BE conf
-       streaming_load_max_mb = 16000
-       ```
+  ```
+  BE conf
+  streaming_load_max_mb = 16000
+  ```
 + Step 2: Calculate whether the approximate import time exceeds the default 
timeout value
 
-       ```
-       Import time 15000/10 = 1500s
-       Over the default timeout time, you need to modify the FE configuration
-       stream_load_default_timeout_second = 1500
-       ```
+  ```
+  Import time 15000/10 = 1500s
+  Over the default timeout time, you need to modify the FE configuration
+  stream_load_default_timeout_second = 1500
+  ```
 
 + Step 3: Create Import Tasks
 
@@ -429,35 +429,35 @@ Doris provides StreamLoad examples in three languages: 
[Java](https://github.com
 
 * Label Already Exists
 
-       The Label repeat checking steps of Stream load are as follows:
+  The Label repeat checking steps of Stream load are as follows:
 
-       1. Is there an import Label conflict that already exists with other 
import methods?
+  1. Is there an import Label conflict that already exists with other import 
methods?
 
-               Because imported Label in Doris system does not distinguish 
between import methods, there is a problem that other import methods use the 
same Label.
+    Because imported Label in Doris system does not distinguish between import 
methods, there is a problem that other import methods use the same Label.
 
-               Through ``SHOW LOAD WHERE LABEL = "xxx"'``, where XXX is a 
duplicate Label string, see if there is already a Label imported by FINISHED 
that is the same as the Label created by the user.
+    Through ``SHOW LOAD WHERE LABEL = "xxx"'``, where XXX is a duplicate Label 
string, see if there is already a Label imported by FINISHED that is the same 
as the Label created by the user.
 
-       2. Are Stream loads submitted repeatedly for the same job?
+  2. Are Stream loads submitted repeatedly for the same job?
 
-               Since Stream load is an HTTP protocol submission creation 
import task, HTTP Clients in various languages usually have their own request 
retry logic. After receiving the first request, the Doris system has started to 
operate Stream load, but because the result is not returned to the Client side 
in time, the Client side will retry to create the request. At this point, the 
Doris system is already operating on the first request, so the second request 
will be reported to Label Already Exists.
+    Since Stream load is an HTTP protocol submission creation import task, 
HTTP Clients in various languages usually have their own request retry logic. 
After receiving the first request, the Doris system has started to operate 
Stream load, but because the result is not returned to the Client side in time, 
the Client side will retry to create the request. At this point, the Doris 
system is already operating on the first request, so the second request will be 
reported to Label Already Exists.
 
-               To sort out the possible methods mentioned above: Search FE 
Master's log with Label to see if there are two ``redirect load action to 
destination = ``redirect load action to destination cases in the same Label. If 
so, the request is submitted repeatedly by the Client side.
+    To sort out the possible methods mentioned above: Search FE Master's log 
with Label to see if there are two ``redirect load action to destination = 
``redirect load action to destination cases in the same Label. If so, the 
request is submitted repeatedly by the Client side.
 
-               It is recommended that the user calculate the approximate 
import time based on the amount of data currently requested, and change the 
request overtime on the client side to a value greater than the import timeout 
time according to the import timeout time to avoid multiple submissions of the 
request by the client side.
+    It is recommended that the user calculate the approximate import time 
based on the amount of data currently requested, and change the request 
overtime on the client side to a value greater than the import timeout time 
according to the import timeout time to avoid multiple submissions of the 
request by the client side.
 
-       3. Connection reset abnormal
+  3. Connection reset abnormal
 
-         In the community version 0.14.0 and earlier versions, the connection 
reset exception occurred after Http V2 was enabled, because the built-in web 
container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is 
a problem with the implementation of this protocol. All In the case of using 
Stream load to import a large amount of data, a connect reset exception will 
occur. This is because tomcat started data transmission before the 307 jump, 
which resulted in the lack of aut [...]
+    In the community version 0.14.0 and earlier versions, the connection reset 
exception occurred after Http V2 was enabled, because the built-in web 
container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is 
a problem with the implementation of this protocol. All In the case of using 
Stream load to import a large amount of data, a connect reset exception will 
occur. This is because tomcat started data transmission before the 307 jump, 
which resulted in the lack of au [...]
 
-         After the upgrade, also upgrade the http client version of your 
program to `4.5.13`，Introduce the following dependencies in your pom.xml file
+    After the upgrade, also upgrade the http client version of your program to 
`4.5.13`，Introduce the following dependencies in your pom.xml file
 
-         ```xml
-             <dependency>
-               <groupId>org.apache.httpcomponents</groupId>
-               <artifactId>httpclient</artifactId>
-               <version>4.5.13</version>
-             </dependency>
-         ```
+    ```xml
+        <dependency>
+          <groupId>org.apache.httpcomponents</groupId>
+          <artifactId>httpclient</artifactId>
+          <version>4.5.13</version>
+        </dependency>
+    ```
  
 * After enabling the Stream Load record on the BE, the record cannot be queried
 
diff --git 
a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md 
b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
index 7153c97978..7e5d6a4dbf 100644
--- a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
@@ -193,10 +193,6 @@ Stream Load 由于使用的是 HTTP 协议，所以所有导入任务有关的
 
   Stream load 导入可以开启两阶段事务提交模式：在Stream 
load过程中，数据写入完成即会返回信息给用户，此时数据不可见，事务状态为`PRECOMMITTED`，用户手动触发commit操作之后，数据才可见。
 
-- enable_profile
-
-  <version since="1.2.7">当 `enable_profile` 为 true 时，Stream Load profile 
将会被打印到 be.INFO 日志中。</version>
-
   示例：
 
   1. 发起stream load预提交操作
@@ -242,6 +238,10 @@ Stream Load 由于使用的是 HTTP 协议，所以所有导入任务有关的
   }
   ```
 
+- enable_profile
+
+  <version since="1.2.7">当 `enable_profile` 为 true 时，Stream Load profile 
将会被打印到 be.INFO 日志中。</version>
+
 ### 使用SQL表达Stream Load的参数
 
 
可以在Header中添加一个`sql`的参数，去替代之前参数中的`column_separator`、`line_delimiter`、`where`、`columns`参数，方便使用。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[doris] branch master updated: [fix](docs) fix stream load docs (#23472)

Reply via email to