This is an automated email from the ASF dual-hosted git repository.
kassiez pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 4b296e39981 [doc](load) optimize column mapping section doc (#2465)
4b296e39981 is described below
commit 4b296e399817b6a9da9160438825eef8d67be068
Author: hui lai <[email protected]>
AuthorDate: Fri Jun 20 20:56:26 2025 +0800
[doc](load) optimize column mapping section doc (#2465)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [ ] Chinese
- [ ] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/data-operate/import/load-data-convert.md | 175 ++++++++++++++++++++
.../data-operate/import/load-data-convert.md | 176 +++++++++++++++++++++
.../data-operate/import/load-data-convert.md | 176 +++++++++++++++++++++
.../data-operate/import/load-data-convert.md | 176 +++++++++++++++++++++
static/images/load-data-convert-csv-en.png | Bin 0 -> 189223 bytes
static/images/load-data-convert-csv.png | Bin 0 -> 155162 bytes
static/images/load-data-convert-json1-en.png | Bin 0 -> 207961 bytes
static/images/load-data-convert-json1.png | Bin 0 -> 171052 bytes
static/images/load-data-convert-json2-en.png | Bin 0 -> 170986 bytes
static/images/load-data-convert-json2.png | Bin 0 -> 151039 bytes
.../data-operate/import/load-data-convert.md | 175 ++++++++++++++++++++
.../data-operate/import/load-data-convert.md | 175 ++++++++++++++++++++
12 files changed, 1053 insertions(+)
diff --git a/docs/data-operate/import/load-data-convert.md
b/docs/data-operate/import/load-data-convert.md
index 386b2ad5b6a..ff6c9ac6ab1 100644
--- a/docs/data-operate/import/load-data-convert.md
+++ b/docs/data-operate/import/load-data-convert.md
@@ -114,6 +114,181 @@ Column mapping is used to define the correspondence
between source data columns
- The order of source data columns and target table columns is inconsistent
- The number of source data columns and target table columns is inconsistent
+### Implementation Principle
+
+Column mapping implementation can be divided into two steps:
+
+**Step 1: Data Source Parsing** - Parse raw data into intermediate variables
based on data format
+**Step 2: Column Mapping and Assignment** - Map intermediate variables to
target table fields by column name
+
+The following are processing flows for three different data formats:
+
+#### Load CSV Format Data
+
+
+
+#### Load JSON Format Data with Specified jsonpaths
+
+
+
+#### Load JSON Format Data without Specified jsonpaths
+
+
+
+### Load JSON Data with Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### Load JSON Data without Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### Adjusting Column Order
Suppose we have the following source data (column names are for illustration
purposes only, and there is no actual header):
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-data-convert.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-data-convert.md
index f789741df3e..33ad0f7687a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-data-convert.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-data-convert.md
@@ -114,6 +114,182 @@ Insert Into 可以直接在 `SELECT` 语句中完成数据转换,使用 `WHERE
- 源数据与目标表的列顺序不一致
- 源数据与目标表的列数量不匹配
+### 实现原理
+
+列映射的实现可以分为两个核心步骤:
+
+**步骤 1:数据源解析** - 根据数据格式将原始数据解析为中间变量
+**步骤 2:通过列映射进行赋值** - 将中间变量按列名映射到目标表字段
+
+以下是三种不同数据格式的处理流程:
+
+#### 导入 CSV 格式数据
+
+
+
+#### 指定 jsonpaths 导入JSON 格式数据
+
+
+
+#### 不指定 jsonpaths 导入JSON 格式数据
+
+
+
+### 指定 jsonpaths 导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### 不指定jsonpaths导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### 调整列顺序
假设有以下源数据(表头列名仅为方便表述,实际并无表头):
```plain
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-data-convert.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-data-convert.md
index f789741df3e..33ad0f7687a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-data-convert.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/import/load-data-convert.md
@@ -114,6 +114,182 @@ Insert Into 可以直接在 `SELECT` 语句中完成数据转换,使用 `WHERE
- 源数据与目标表的列顺序不一致
- 源数据与目标表的列数量不匹配
+### 实现原理
+
+列映射的实现可以分为两个核心步骤:
+
+**步骤 1:数据源解析** - 根据数据格式将原始数据解析为中间变量
+**步骤 2:通过列映射进行赋值** - 将中间变量按列名映射到目标表字段
+
+以下是三种不同数据格式的处理流程:
+
+#### 导入 CSV 格式数据
+
+
+
+#### 指定 jsonpaths 导入JSON 格式数据
+
+
+
+#### 不指定 jsonpaths 导入JSON 格式数据
+
+
+
+### 指定 jsonpaths 导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### 不指定jsonpaths导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### 调整列顺序
假设有以下源数据(表头列名仅为方便表述,实际并无表头):
```plain
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-data-convert.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-data-convert.md
index f789741df3e..33ad0f7687a 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-data-convert.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/import/load-data-convert.md
@@ -114,6 +114,182 @@ Insert Into 可以直接在 `SELECT` 语句中完成数据转换,使用 `WHERE
- 源数据与目标表的列顺序不一致
- 源数据与目标表的列数量不匹配
+### 实现原理
+
+列映射的实现可以分为两个核心步骤:
+
+**步骤 1:数据源解析** - 根据数据格式将原始数据解析为中间变量
+**步骤 2:通过列映射进行赋值** - 将中间变量按列名映射到目标表字段
+
+以下是三种不同数据格式的处理流程:
+
+#### 导入 CSV 格式数据
+
+
+
+#### 指定 jsonpaths 导入JSON 格式数据
+
+
+
+#### 不指定 jsonpaths 导入JSON 格式数据
+
+
+
+### 指定 jsonpaths 导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### 不指定jsonpaths导入 JSON 数据
+假设有以下源数据(表头列名仅为方便表述,实际并无表头):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### 创建目标表
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### 导入数据
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+
+##### 查询结果
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### 调整列顺序
假设有以下源数据(表头列名仅为方便表述,实际并无表头):
```plain
diff --git a/static/images/load-data-convert-csv-en.png
b/static/images/load-data-convert-csv-en.png
new file mode 100644
index 00000000000..7aeca823049
Binary files /dev/null and b/static/images/load-data-convert-csv-en.png differ
diff --git a/static/images/load-data-convert-csv.png
b/static/images/load-data-convert-csv.png
new file mode 100644
index 00000000000..0b337437c88
Binary files /dev/null and b/static/images/load-data-convert-csv.png differ
diff --git a/static/images/load-data-convert-json1-en.png
b/static/images/load-data-convert-json1-en.png
new file mode 100644
index 00000000000..49e2e51e461
Binary files /dev/null and b/static/images/load-data-convert-json1-en.png differ
diff --git a/static/images/load-data-convert-json1.png
b/static/images/load-data-convert-json1.png
new file mode 100644
index 00000000000..72c347160fa
Binary files /dev/null and b/static/images/load-data-convert-json1.png differ
diff --git a/static/images/load-data-convert-json2-en.png
b/static/images/load-data-convert-json2-en.png
new file mode 100644
index 00000000000..5bfb4db7199
Binary files /dev/null and b/static/images/load-data-convert-json2-en.png differ
diff --git a/static/images/load-data-convert-json2.png
b/static/images/load-data-convert-json2.png
new file mode 100644
index 00000000000..398a9e366eb
Binary files /dev/null and b/static/images/load-data-convert-json2.png differ
diff --git
a/versioned_docs/version-2.1/data-operate/import/load-data-convert.md
b/versioned_docs/version-2.1/data-operate/import/load-data-convert.md
index 386b2ad5b6a..ff6c9ac6ab1 100644
--- a/versioned_docs/version-2.1/data-operate/import/load-data-convert.md
+++ b/versioned_docs/version-2.1/data-operate/import/load-data-convert.md
@@ -114,6 +114,181 @@ Column mapping is used to define the correspondence
between source data columns
- The order of source data columns and target table columns is inconsistent
- The number of source data columns and target table columns is inconsistent
+### Implementation Principle
+
+Column mapping implementation can be divided into two steps:
+
+**Step 1: Data Source Parsing** - Parse raw data into intermediate variables
based on data format
+**Step 2: Column Mapping and Assignment** - Map intermediate variables to
target table fields by column name
+
+The following are processing flows for three different data formats:
+
+#### Load CSV Format Data
+
+
+
+#### Load JSON Format Data with Specified jsonpaths
+
+
+
+#### Load JSON Format Data without Specified jsonpaths
+
+
+
+### Load JSON Data with Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### Load JSON Data without Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### Adjusting Column Order
Suppose we have the following source data (column names are for illustration
purposes only, and there is no actual header):
diff --git
a/versioned_docs/version-3.0/data-operate/import/load-data-convert.md
b/versioned_docs/version-3.0/data-operate/import/load-data-convert.md
index 386b2ad5b6a..ff6c9ac6ab1 100644
--- a/versioned_docs/version-3.0/data-operate/import/load-data-convert.md
+++ b/versioned_docs/version-3.0/data-operate/import/load-data-convert.md
@@ -114,6 +114,181 @@ Column mapping is used to define the correspondence
between source data columns
- The order of source data columns and target table columns is inconsistent
- The number of source data columns and target table columns is inconsistent
+### Implementation Principle
+
+Column mapping implementation can be divided into two steps:
+
+**Step 1: Data Source Parsing** - Parse raw data into intermediate variables
based on data format
+**Step 2: Column Mapping and Assignment** - Map intermediate variables to
target table fields by column name
+
+The following are processing flows for three different data formats:
+
+#### Load CSV Format Data
+
+
+
+#### Load JSON Format Data with Specified jsonpaths
+
+
+
+#### Load JSON Format Data without Specified jsonpaths
+
+
+
+### Load JSON Data with Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:col1, col3, col2, col4" \
+ -H "jsonpaths:[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (col1, col3, col2, col4)
+ PROPERTIES
+ (
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]"
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(col1, col3, col2, col4)
+PROPERTIES
+(
+ "format" = "json",
+ "jsonpaths" = "[\"$.k1\", \"$.k2\", \"$.k3\", \"$.k4\"]",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
+### Load JSON Data without Specified jsonpaths
+Assume the following source data (column headers are for illustration only, no
actual headers exist):
+```plain
+{"k1":1,"k2":"100","k3":"beijing","k4":1.1}
+{"k1":2,"k2":"200","k3":"shanghai","k4":1.2}
+{"k1":3,"k2":"300","k3":"guangzhou","k4":1.3}
+{"k1":4,"k2":"\\N","k3":"chongqing","k4":1.4}
+```
+
+##### Create Target Table
+```sql
+CREATE TABLE example_table
+(
+ col1 INT,
+ col2 STRING,
+ col3 INT,
+ col4 DOUBLE
+) ENGINE = OLAP
+DUPLICATE KEY(col1)
+DISTRIBUTED BY HASH(col1) BUCKETS 1;
+```
+
+##### Load Data
+- Stream Load
+```sql
+curl --location-trusted -u user:passwd \
+ -H "columns:k1, k3, k2, k4,col1 = k1, col2 = k3, col3 = k2, col4 = k4" \
+ -H "format:json" \
+ -H "read_json_by_line:true" \
+ -T data.json \
+ -X PUT \
+ http://<fe_ip>:<fe_http_port>/api/example_db/example_table/_stream_load
+```
+
+- Broker Load
+```sql
+LOAD LABEL example_db.label_broker
+(
+ DATA INFILE("s3://bucket_name/data.json")
+ INTO TABLE example_table
+ FORMAT AS "json"
+ (k1, k3, k2, k4)
+ SET (
+ col1 = k1,
+ col2 = k3,
+ col3 = k2,
+ col4 = k4
+ )
+)
+WITH s3 (...);
+```
+
+- Routine Load
+```sql
+CREATE ROUTINE LOAD example_db.example_routine_load ON example_table
+COLUMNS(k1, k3, k2, k4, col1 = k1, col2 = k3, col3 = k2, col4 = k4),
+PROPERTIES
+(
+ "format" = "json",
+ "read_json_by_line" = "true"
+)
+FROM KAFKA (...);
+```
+
+##### Query Results
+```
+mysql> SELECT * FROM example_table;
++------+-----------+------+------+
+| col1 | col2 | col3 | col4 |
++------+-----------+------+------+
+| 1 | beijing | 100 | 1.1 |
+| 2 | shanghai | 200 | 1.2 |
+| 3 | guangzhou | 300 | 1.3 |
+| 4 | chongqing | NULL | 1.4 |
++------+-----------+------+------+
+```
+
### Adjusting Column Order
Suppose we have the following source data (column names are for illustration
purposes only, and there is no actual header):
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]