This is an automated email from the ASF dual-hosted git repository.
jiafengzheng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 9166bc0c6c load json format fix
9166bc0c6c is described below
commit 9166bc0c6c0f1f4f34926550469a44f34e1879e2
Author: jiafeng.zhang <[email protected]>
AuthorDate: Fri Jul 22 11:39:56 2022 +0800
load json format fix
load json format fix
---
.../import/import-way/load-json-format.md | 70 ++++++++++++++++------
.../import/import-way/load-json-format.md | 68 ++++++++++++++++-----
2 files changed, 105 insertions(+), 33 deletions(-)
diff --git a/docs/data-operate/import/import-way/load-json-format.md
b/docs/data-operate/import/import-way/load-json-format.md
index d6097e2137..eafdc543e4 100644
--- a/docs/data-operate/import/import-way/load-json-format.md
+++ b/docs/data-operate/import/import-way/load-json-format.md
@@ -66,19 +66,32 @@ Currently only the following two Json formats are supported:
This method must be used with the setting `strip_outer_array=true`. Doris
will expand the array when parsing, and then parse each Object in turn as a row
of data.
2. A single row of data represented by Object
-
Json format with Object as root node. The entire Object represents a row of
data to be imported. An example is as follows:
````json
{ "id": 123, "city" : "beijing"}
````
-
+
````json
{ "id": 123, "city" : { "name" : "beijing", "region" : "haidian" }}
````
-
+
This method is usually used for the Routine Load import method, such as
representing a message in Kafka, that is, a row of data.
+3. Multiple lines of Object data separated by a fixed delimiter
+
+ A row of data represented by Object represents a row of data to be
imported. The example is as follows:
+
+ ````json
+ { "id": 123, "city" : "beijing"}
+ { "id": 456, "city" : "shanghai"}
+ ...
+ ````
+
+ This method is typically used for Stream Load import methods to represent
multiple rows of data in a batch of imported data.
+
+ This method must be used with the setting `read_json_by_line=true`, the
special delimiter also needs to specify the `line_delimiter` parameter, the
default is `\n`. When Doris parses, it will be separated according to the
delimiter, and then parse each line of Object as a line of data.
+
### fuzzy_parse parameters
In [STREAM
LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md)
`fuzzy_parse` parameter can be added to speed up JSON Data import efficiency.
@@ -380,7 +393,7 @@ code INT NULL
100 beijing 1
````
-3. Import multiple rows of data
+3. Import multiple rows of data as Array
````json
[
@@ -416,24 +429,45 @@ code INT NULL
105 {"order1":["guangzhou"]} 6
````
-4. Transform the imported data
+4. Import multi-line data as multi-line Object
- The data is still the multi-line data in Example 3, and now it is necessary
to add 1 to the `code` column in the imported data before importing.
+ ```json
+ {"id": 100, "city": "beijing", "code" : 1}
+ {"id": 101, "city": "shanghai"}
+ {"id": 102, "city": "tianjin", "code" : 3}
+ {"id": 103, "city": "chongqing", "code" : 4}
+ ```
- ```bash
- curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths:
[\"$.id\",\"$.city\",\"$.code\"]" - H "strip_outer_array: true" -H "columns:
id, city, tmpc, code=tmpc+1" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
- ````
+ StreamLoad import:
- Import result:
+```bash
+curl --location-trusted -u user:passwd -H "format: json" -H
"read_json_by_line: true" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
+```
+ Import result:
+
+ 100 beijing 1
+ 101 shanghai NULL
+ 102 tianjin 3
+ 103 chongqing 4
- ````text
- 100 beijing 2
- 101 shanghai NULL
- 102 tianjin 4
- 103 chongqing 5
- 104 ["zhejiang","guangzhou"] 6
- 105 {"order1":["guangzhou"]} 7
- ````
+5. Transform the imported data
+
+The data is still the multi-line data in Example 3, and now it is necessary to
add 1 to the `code` column in the imported data before importing.
+
+```bash
+curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths:
[\"$.id\",\"$.city\",\"$.code\"]" - H "strip_outer_array: true" -H "columns:
id, city, tmpc, code=tmpc+1" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
+````
+
+Import result:
+
+````text
+100 beijing 2
+101 shanghai NULL
+102 tianjin 4
+103 chongqing 5
+104 ["zhejiang","guangzhou"] 6
+105 {"order1":["guangzhou"]} 7
+````
### Routine Load
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md
index 97a4ebf99c..25cc84b5db 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/load-json-format.md
@@ -78,6 +78,20 @@ Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格
```
这种方式通常用于 Routine Load 导入方式,如表示 Kafka 中的一条消息,即一行数据。
+
+2. 以固定分隔符分隔的多行 Object 数据
+
+ Object表示的一行数据即表示要导入的一行数据,示例如下:
+
+ ```json
+ { "id": 123, "city" : "beijing"}
+ { "id": 456, "city" : "shanghai"}
+ ...
+ ```
+
+ 这种方式通常用于 Stream Load 导入方式,以便在一批导入数据中表示多行数据。
+
+ 这种方式必须配合设置 `read_json_by_line=true`
使用,特殊分隔符还需要指定`line_delimiter`参数,默认`\n`。Doris 在解析时会按照分隔符分隔,然后解析其中的每一行 Object
作为一行数据。
### fuzzy_parse 参数
@@ -379,7 +393,7 @@ code INT NULL
100 beijing 1
```
-3. 导入多行数据
+3. 以 Array 形式导入多行数据
```json
[
@@ -415,24 +429,48 @@ code INT NULL
105 {"order1":["guangzhou"]} 6
```
-4. 对导入数据进行转换
+4. 以 多行Object 形式导入多行数据
- 数据依然是示例3中的多行数据,现需要对导入数据中的 `code` 列加1后导入。
+ ```
+ {"id": 100, "city": "beijing", "code" : 1}
+ {"id": 101, "city": "shanghai"}
+ {"id": 102, "city": "tianjin", "code" : 3}
+ {"id": 103, "city": "chongqing", "code" : 4}
+ ```
- ```bash
- curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths:
[\"$.id\",\"$.city\",\"$.code\"]" -H "strip_outer_array: true" -H "columns: id,
city, tmpc, code=tmpc+1" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
- ```
+StreamLoad导入:
+
+```bash
+curl --location-trusted -u user:passwd -H "format: json" -H
"read_json_by_line: true" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
+```
- 导入结果:
+导入结果:
- ```text
- 100 beijing 2
- 101 shanghai NULL
- 102 tianjin 4
- 103 chongqing 5
- 104 ["zhejiang","guangzhou"] 6
- 105 {"order1":["guangzhou"]} 7
- ```
+```
+100 beijing 1
+101 shanghai NULL
+102 tianjin 3
+103 chongqing 4
+```
+
+5. 对导入数据进行转换
+
+数据依然是示例3中的多行数据,现需要对导入数据中的 `code` 列加1后导入。
+
+```bash
+curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths:
[\"$.id\",\"$.city\",\"$.code\"]" -H "strip_outer_array: true" -H "columns: id,
city, tmpc, code=tmpc+1" -T data.json
http://localhost:8030/api/db1/tbl1/_stream_load
+```
+
+导入结果:
+
+```text
+100 beijing 2
+101 shanghai NULL
+102 tianjin 4
+103 chongqing 5
+104 ["zhejiang","guangzhou"] 6
+105 {"order1":["guangzhou"]} 7
+```
### Routine Load
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]