This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 7a28d9d90be [Doc](partial update) use a separate config to control the
behavior of newly inserted rows in partial update (#2472)
7a28d9d90be is described below
commit 7a28d9d90be4ed19d095c52a7245cc3c92bbbd22
Author: bobhan1 <[email protected]>
AuthorDate: Sun Aug 10 22:27:20 2025 +0800
[Doc](partial update) use a separate config to control the behavior of
newly inserted rows in partial update (#2472)
## Versions
- [x] dev
- [ ] 3.0
- [ ] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/data-operate/import/handling-messy-data.md | 71 +--------------------
.../import/import-way/stream-load-manual.md | 3 +-
docs/data-operate/update/update-of-unique-model.md | 70 +++++++++++++++++++++
.../data-operate/import/handling-messy-data.md | 72 +---------------------
.../import/import-way/stream-load-manual.md | 3 +-
.../data-operate/update/update-of-unique-model.md | 71 +++++++++++++++++++++
6 files changed, 147 insertions(+), 143 deletions(-)
diff --git a/docs/data-operate/import/handling-messy-data.md
b/docs/data-operate/import/handling-messy-data.md
index c164ab162a8..1f8d384e539 100644
--- a/docs/data-operate/import/handling-messy-data.md
+++ b/docs/data-operate/import/handling-messy-data.md
@@ -16,9 +16,7 @@ This makes it easier to handle data loading problems and
keeps data management s
## Strict Mode
-Strict mode serves two primary purposes:
-1. Filtering out data rows where column type conversion fails during load
-2. Restricting updates to existing columns only in partial column update
scenarios
+The main function of strict mode is to filter out data rows where column type
conversion fails during load.
### Filtering Strategy for Column Type Conversion Failures
@@ -62,73 +60,6 @@ The system employs different strategies based on the strict
mode setting:
3. Although `10` exceeds the range, since its type meets decimal requirements,
strict mode does not affect it.
:::
-### Restricting Partial Column Updates to Existing Columns Only
-
-In strict mode, each row in a partial column update must have its Key already
exist in the table. In non-strict mode, partial column updates can both update
existing rows (where Key exists) and insert new rows (where Key doesn't exist).
-
-For example, given a table structure as follows:
-```sql
-CREATE TABLE user_profile
-(
- id INT,
- name VARCHAR(10),
- age INT,
- city VARCHAR(10),
- balance DECIMAL(9, 0),
- last_access_time DATETIME
-) ENGINE=OLAP
-UNIQUE KEY(id)
-DISTRIBUTED BY HASH(id) BUCKETS 1
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
-```
-
-The table contains one record as follows:
-```sql
-mysql> select * from user_profile;
-+------+-------+------+----------+---------+---------------------+
-| id | name | age | city | balance | last_access_time |
-+------+-------+------+----------+---------+---------------------+
-| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00|
-+------+-------+------+----------+---------+---------------------+
-```
-
-When using Insert Into with strict mode to perform partial column updates, the
insertion will fail because the second and third rows with keys `(3)` and
`(18)` do not exist in the original table:
-```sql
-SET enable_unique_key_partial_update=true;
-SET enable_insert_strict = true;
-INSERT INTO user_profile (id, balance, last_access_time) VALUES
-(1, 500, '2023-07-03 12:00:01'),
-(3, 23, '2023-07-03 12:00:02'),
-(18, 9999999, '2023-07-03 12:00:03');
-ERROR 1105 (HY000): errCode = 2, detailMessage = Insert has filtered data in
strict mode
-```
-
-When using Insert Into with non-strict mode to perform partial column updates:
-```sql
-SET enable_unique_key_partial_update=true;
-SET enable_insert_strict = false;
-INSERT INTO user_profile (id, balance, last_access_time) VALUES
-(1, 500, '2023-07-03 12:00:01'),
-(3, 23, '2023-07-03 12:00:02'),
-(18, 9999999, '2023-07-03 12:00:03');
-```
-
-The existing record will be updated, and two new records will be inserted. For
columns not specified in the insert statement, if a default value is defined,
it will be used; if the column allows NULL values, NULL will be used;
otherwise, the insertion will fail.
-
-The query result is as follows:
-```sql
-mysql> select * from user_profile;
-+------+-------+------+----------+---------+---------------------+
-| id | name | age | city | balance | last_access_time |
-+------+-------+------+----------+---------+---------------------+
-| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 |
-| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 |
-| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 |
-+------+-------+------+----------+---------+---------------------+
-```
-
### Enable Strict Mode
Strict mode (strict_mode) defaults to False. Here's how to set it for
different load methods:
diff --git a/docs/data-operate/import/import-way/stream-load-manual.md
b/docs/data-operate/import/import-way/stream-load-manual.md
index 9e3853a151b..0be87cfe2f1 100644
--- a/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/data-operate/import/import-way/stream-load-manual.md
@@ -333,6 +333,7 @@ Parameter Description: The default timeout for Stream Load.
The load job will be
| escape | Specify the escape character. It is used to
escape characters that are the same as the enclosure character within a field.
For example, if the data is "a,'b,'c'", and the enclosure is "'", and you want
"b,'c" to be parsed as a single field, you need to specify a single-byte escape
character, such as "", and modify the data to "a,'b','c'". |
| memtable_on_sink_node | Whether to enable MemTable on DataSink node
when loading data, default is false. |
|unique_key_update_mode | The update modes on Unique tables, currently
are only effective for Merge-On-Write Unique tables. Supporting three types:
`UPSERT`, `UPDATE_FIXED_COLUMNS`, and `UPDATE_FLEXIBLE_COLUMNS`. `UPSERT`:
Indicates that data is loaded with upsert semantics; `UPDATE_FIXED_COLUMNS`:
Indicates that data is loaded through partial updates;
`UPDATE_FLEXIBLE_COLUMNS`: Indicates that data is loaded through flexible
partial updates.|
+| partial_update_new_key_behavior | When performing partial column updates or
flexible column updates on Unique tables, this parameter controls how new rows
are handled. There are two types: `APPEND` and `ERROR`.<br/>- `APPEND`: Allows
inserting new row data<br/>- `ERROR`: Fails and reports an error when inserting
new rows |
### Load return value
@@ -1013,7 +1014,7 @@ Doris supports a very rich set of column transformations
and filtering operation
### Enable strict mode import
-The strict_mode attribute is used to set whether the import task runs in
strict mode. This attribute affects the results of column mapping,
transformation, and filtering, and it also controls the behavior of partial
column updates. For specific instructions on strict mode, please refer to the
[Handling Messy Data](../../../data-operate/import/handling-messy-data)
documentation.
+The strict_mode attribute is used to set whether the import task runs in
strict mode. This attribute affects the results of column mapping,
transformation, and filtering. For specific instructions on strict mode, please
refer to the [Handling Messy
Data](../../../data-operate/import/handling-messy-data) documentation.
### Perform partial column updates/flexible partial update during import
diff --git a/docs/data-operate/update/update-of-unique-model.md
b/docs/data-operate/update/update-of-unique-model.md
index 600906d7956..b39cba6f230 100644
--- a/docs/data-operate/update/update-of-unique-model.md
+++ b/docs/data-operate/update/update-of-unique-model.md
@@ -396,3 +396,73 @@ The final data in the table is as follows:
6. Flexible partial updates are not supported on tables with Variant columns.
7. Flexible partial updates are not supported on tables with synchronous
materialized views.
+
+
+## Handling New Rows in Partial Column Updates
+
+The session variable or import property `partial_update_new_key_behavior`
controls the behavior when inserting new rows during partial column updates.
+
+When `partial_update_new_key_behavior=ERROR`, each inserted row must have a
key that already exists in the table. When
`partial_update_new_key_behavior=APPEND`, partial column updates can update
existing rows with matching keys or insert new rows with keys that do not exist
in the table.
+
+For example, consider the following table structure:
+```sql
+CREATE TABLE user_profile
+(
+ id INT,
+ name VARCHAR(10),
+ age INT,
+ city VARCHAR(10),
+ balance DECIMAL(9, 0),
+ last_access_time DATETIME
+) ENGINE=OLAP
+UNIQUE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+Suppose the table contains the following data:
+```sql
+mysql> select * from user_profile;
++------+-------+------+----------+---------+---------------------+
+| id | name | age | city | balance | last_access_time |
++------+-------+------+----------+---------+---------------------+
+| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00|
++------+-------+------+----------+---------+---------------------+
+```
+
+If you use `Insert Into` for partial column updates with
`partial_update_new_key_behavior=ERROR`, and try to insert the following data,
the operation will fail because the keys `(3)` and `(18)` do not exist in the
original table:
+```sql
+SET enable_unique_key_partial_update=true;
+SET partial_update_new_key_behavior=ERROR;
+INSERT INTO user_profile (id, balance, last_access_time) VALUES
+(1, 500, '2023-07-03 12:00:01'),
+(3, 23, '2023-07-03 12:00:02'),
+(18, 9999999, '2023-07-03 12:00:03');
+(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error:
[E-7003]Can't append new rows in partial update when
partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table.,
host: 127.0.0.1")
+```
+
+If you use `partial_update_new_key_behavior=APPEND` and perform the same
partial column update:
+```sql
+SET enable_unique_key_partial_update=true;
+SET partial_update_new_key_behavior=APPEND;
+INSERT INTO user_profile (id, balance, last_access_time) VALUES
+(1, 500, '2023-07-03 12:00:01'),
+(3, 23, '2023-07-03 12:00:02'),
+(18, 9999999, '2023-07-03 12:00:03');
+```
+
+The existing row will be updated, and two new rows will be inserted. For
columns not specified in the inserted data, if a default value is defined, the
default will be used; if the column is nullable, NULL will be used; otherwise,
the insert will fail.
+
+The query result will be:
+```sql
+mysql> select * from user_profile;
++------+-------+------+----------+---------+---------------------+
+| id | name | age | city | balance | last_access_time |
++------+-------+------+----------+---------+---------------------+
+| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 |
+| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 |
+| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 |
++------+-------+------+----------+---------+---------------------+
+```
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/handling-messy-data.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/handling-messy-data.md
index 4eaace74f79..4aefab7b584 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/handling-messy-data.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/handling-messy-data.md
@@ -14,9 +14,7 @@
## 严格模式
-严格模式具有两个主要功能:
-1. 对导入过程中发生列类型转换失败的数据行进行过滤。
-2. 在部分列更新场景中,限制只能更新已存在的列。
+严格模式的主要功能是对导入过程中发生列类型转换失败的数据行进行过滤。
### 列类型转换失败的过滤策略
@@ -60,74 +58,6 @@
3. `10` 虽然是一个超过范围的值,但是因为其类型符合 decimal 的要求,所以严格模式对其不产生影响。
:::
-
-### 限定部分列更新只能更新已有的列
-
-在严格模式下,部分列更新插入的每一行数据必须满足该行数据的 Key 在表中已经存在。而在非严格模式下,进行部分列更新时可以更新 Key
已经存在的行,也可以插入 Key 不存在的新行。
-
-例如有表结构如下:
-```sql
-CREATE TABLE user_profile
-(
- id INT,
- name VARCHAR(10),
- age INT,
- city VARCHAR(10),
- balance DECIMAL(9, 0),
- last_access_time DATETIME
-) ENGINE=OLAP
-UNIQUE KEY(id)
-DISTRIBUTED BY HASH(id) BUCKETS 1
-PROPERTIES (
- "enable_unique_key_merge_on_write" = "true"
-);
-```
-
-表中有一条数据如下:
-```sql
-mysql> select * from user_profile;
-+------+-------+------+----------+---------+---------------------+
-| id | name | age | city | balance | last_access_time |
-+------+-------+------+----------+---------+---------------------+
-| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00|
-+------+-------+------+----------+---------+---------------------+
-```
-
-当用户使用严格模式的 Insert Into 部分列更新向表中插入上述数据时,由于开启了严格模式且第二、三行的数据的 key(`(3)`, `(18)`)
不在原表中,所以本次插入会失败:
-```sql
-SET enable_unique_key_partial_update=true;
-SET enable_insert_strict = true;
-INSERT INTO user_profile (id, balance, last_access_time) VALUES
-(1, 500, '2023-07-03 12:00:01'),
-(3, 23, '2023-07-03 12:00:02'),
-(18, 9999999, '2023-07-03 12:00:03');
-ERROR 1105 (HY000): errCode = 2, detailMessage = Insert has filtered data in
strict mode
-```
-
-当用户使用非严格模式的 Insert Into 部分列更新向表中插入如下数据时:
-```sql
-SET enable_unique_key_partial_update=true;
-SET enable_insert_strict = false;
-INSERT INTO user_profile (id, balance, last_access_time) VALUES
-(1, 500, '2023-07-03 12:00:01'),
-(3, 23, '2023-07-03 12:00:02'),
-(18, 9999999, '2023-07-03 12:00:03');
-```
-
-表中原有的一条数据将会被更新,此外还向表中插入了两条新数据。对于插入的数据中用户没有指定的列,如果该列有默认值,则会以默认值填充;否则,如果该列可以为
NULL,则将以 NULL 值填充;否则本次插入不成功。
-
-查询结果如下:
-```sql
-mysql> select * from user_profile;
-+------+-------+------+----------+---------+---------------------+
-| id | name | age | city | balance | last_access_time |
-+------+-------+------+----------+---------+---------------------+
-| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 |
-| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 |
-| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 |
-+------+-------+------+----------+---------+---------------------+
-```
-
### 开启严格模式
严格模式(strict_mode)默认为 False,以下是各种导入方式的设置方法:
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
index 38203808953..4e7cd6649a2 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md
@@ -329,6 +329,7 @@ Stream Load 操作支持 HTTP 分块导入(HTTP chunked)与 HTTP 非分块
| escape | 指定转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为
"a,'b,'c'",包围符为 "'",希望 "b,'c 被作为一个字段解析,则需要指定单字节转义符,例如"\",将数据修改为 "a,'b,\'c'"。 |
| memtable_on_sink_node | 导入数据的时候是否开启 MemTable 前移,默认为 false。 |
| unique_key_update_mode | Unique 表上的更新模式,目前仅对 Merge-On-Write Unique
表有效,一共支持三种类型 `UPSERT`, `UPDATE_FIXED_COLUMNS`, `UPDATE_FLEXIBLE_COLUMNS`。
`UPSERT`: 表示以 upsert 语义导入数据; `UPDATE_FIXED_COLUMNS`:
表示以[部分列更新](../../../data-operate/update/update-of-unique-model)的方式导入数据;
`UPDATE_FLEXIBLE_COLUMNS`:
表示以[灵活部分列更新](../../../data-operate/update/update-of-unique-model)的方式导入数据|
+| partial_update_new_key_behavior | Unique 表上进行部分列更新或灵活列更新时,对新插入行的处理方式。有两种类型
`APPEND`, `ERROR`。<br/>-`APPEND`: 允许插入新行数据<br/>-`ERROR`: 插入新行时倒入失败并报错 |
### 导入返回值
@@ -1017,7 +1018,7 @@ Doris 可以在导入语句中支持非常丰富的列转换和过滤操作。
### 启用严格模式导入
-`strict_mode`
属性用于设置导入任务是否运行在严格模式下。该属性会对列映射、转换和过滤的结果产生影响,它同时也将控制部分列更新的行为。关于严格模式的具体说明,可参阅
[严格模式](../handling-messy-data#严格模式) 文档。
+`strict_mode` 属性用于设置导入任务是否运行在严格模式下。该属性会对列映射、转换和过滤的结果产生影响。关于严格模式的具体说明,可参阅
[严格模式](../handling-messy-data#严格模式) 文档。
### 导入时进行部分列更新/灵活部分列更新
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-unique-model.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-unique-model.md
index 101a550c452..b60d93fa1ed 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-unique-model.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-unique-model.md
@@ -388,3 +388,74 @@ PROPERTIES(
6. 不支持在有 Variant 列的表上进行灵活列更新。
7. 不支持在有同步物化视图的表上进行灵活列更新。
+
+
+## 部分列更新/灵活列更新中对新插入的行的处理
+
+session variable或导入属性`partial_update_new_key_behavior`用于控制部分列更新和灵活列更新中插入的新行的行为。
+
+当`partial_update_new_key_behavior=ERROR`时,插入的每一行数据必须满足该行数据的 Key
在表中已经存在。而当`partial_update_new_key_behavior=APPEND`时,进行部分列更新或灵活列更新时可以更新 Key
已经存在的行,也可以插入 Key 不存在的新行。
+
+例如有表结构如下:
+```sql
+CREATE TABLE user_profile
+(
+ id INT,
+ name VARCHAR(10),
+ age INT,
+ city VARCHAR(10),
+ balance DECIMAL(9, 0),
+ last_access_time DATETIME
+) ENGINE=OLAP
+UNIQUE KEY(id)
+DISTRIBUTED BY HASH(id) BUCKETS 1
+PROPERTIES (
+ "enable_unique_key_merge_on_write" = "true"
+);
+```
+
+表中有一条数据如下:
+```sql
+mysql> select * from user_profile;
++------+-------+------+----------+---------+---------------------+
+| id | name | age | city | balance | last_access_time |
++------+-------+------+----------+---------+---------------------+
+| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00|
++------+-------+------+----------+---------+---------------------+
+```
+
+当用户在`partial_update_new_key_behavior=ERROR`的情况下使用 Insert Into
部分列更新向表中插入上述数据时,由于第二、三行的数据的 key(`(3)`, `(18)`) 不在原表中,所以本次插入会失败:
+```sql
+SET enable_unique_key_partial_update=true;
+SET partial_update_new_key_behavior=ERROR;
+INSERT INTO user_profile (id, balance, last_access_time) VALUES
+(1, 500, '2023-07-03 12:00:01'),
+(3, 23, '2023-07-03 12:00:02'),
+(18, 9999999, '2023-07-03 12:00:03');
+(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error:
[E-7003]Can't append new rows in partial update when
partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table.,
host: 127.0.0.1")
+```
+
+当用在`partial_update_new_key_behavior=APPEND`的情况下使用 Insert Into 部分列更新向表中插入如下数据时:
+```sql
+SET enable_unique_key_partial_update=true;
+SET partial_update_new_key_behavior=APPEND;
+INSERT INTO user_profile (id, balance, last_access_time) VALUES
+(1, 500, '2023-07-03 12:00:01'),
+(3, 23, '2023-07-03 12:00:02'),
+(18, 9999999, '2023-07-03 12:00:03');
+```
+
+表中原有的一条数据将会被更新,此外还向表中插入了两条新数据。对于插入的数据中用户没有指定的列,如果该列有默认值,则会以默认值填充;否则,如果该列可以为
NULL,则将以 NULL 值填充;否则本次插入不成功。
+
+查询结果如下:
+```sql
+mysql> select * from user_profile;
++------+-------+------+----------+---------+---------------------+
+| id | name | age | city | balance | last_access_time |
++------+-------+------+----------+---------+---------------------+
+| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 |
+| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 |
+| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 |
++------+-------+------+----------+---------+---------------------+
+```
+
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]