This is an automated email from the ASF dual-hosted git repository.
zhangchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 8392376723 [doc](update) add a table to compare the performance
characteristics and feature support of different data models. (#1311)
8392376723 is described below
commit 8392376723a884f6649d19ad0749b3191a66a71c
Author: zhannngchen <[email protected]>
AuthorDate: Fri Nov 8 15:21:39 2024 +0800
[doc](update) add a table to compare the performance characteristics and
feature support of different data models. (#1311)
# Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [x] 2.0
# Languages
- [x] Chinese
- [x] English
---------
Co-authored-by: Chen Zhang <[email protected]>
---
docs/data-operate/update/update-overview.md | 50 ++++++++++++++++++----
.../current/data-operate/update/update-overview.md | 50 +++++++++++++++++-----
.../data-operate/update/update-overview.md | 50 +++++++++++++++++-----
.../data-operate/update/update-overview.md | 50 +++++++++++++++++-----
.../data-operate/update/update-overview.md | 48 +++++++++++++++++----
.../data-operate/update/update-overview.md | 50 ++++++++++++++++++----
.../data-operate/update/update-overview.md | 50 ++++++++++++++++++----
.../data-operate/update/update-overview.md | 50 ++++++++++++++++++----
8 files changed, 327 insertions(+), 71 deletions(-)
diff --git a/docs/data-operate/update/update-overview.md
b/docs/data-operate/update/update-overview.md
index 039a33569c..aff46a96c5 100644
--- a/docs/data-operate/update/update-overview.md
+++ b/docs/data-operate/update/update-overview.md
@@ -58,23 +58,37 @@ Starting from Doris 2.1, MoW is the default mode for the
unique key model. So, i
### Two Update Methods in Unique Key Model
-- Update statement: This method is used to update a specific column and is
suitable for infrequent updates with a small amount of data.
+#### `UPDATE` statement
-- Batch update based on load: Doris supports various load methods such as
Stream Load, Broker Load, Routine Load, and Insert Into. For unique key tables,
all load have the "UPSERT" semantics, meaning that if a row with the same key
does not exist, it will be inserted, and if it already exists, it will be
updated.
+Whether it is MoR or MoW, the semantics ensure that the specified columns are
updated. The time taken for a single UPDATE operation increases with the amount
of data being updated.
-- If all columns are updated, MoR and MoW have the same semantics, which is to
replace all value columns for the same key.
+#### Batch update based on load
-- If only some columns are updated, the default semantics for MoR and MoW are
the same. In this case, the missing columns in the table schema will be updated
with their default values, overwriting the old records.
+Doris supports multiple data load methods, including Stream Load, Broker Load,
Routine Load, and Insert Into. For primary key tables, all load operations
default to "UPSERT" semantics: if a record with the same key does not exist, it
is inserted; if it already exists, it is updated. There are two types of
updates: full row updates and partial column updates.
-- If only some columns are updated and MoW is used in the unique key model,
and the MySQL session variable "partial_columns" is set to true, or the HTTP
header "partial_columns" is set to true, the missing columns will be updated
with the corresponding missing column values from the existing record, instead
of using the default values from the table schema.
+- **Full Row Update**: Updates for Unique Key tables default to full row
updates. During data loading, users can choose to provide either all fields or
only part of them. If only partial fields are provided, Doris will fill in the
missing fields with default values to form a complete record for updating.
+
+- **Partial Column Update**: Unique Key MoW supports partial column updates.
Users can enable this feature by setting the session variable
`enable_unique_key_partial_update = true` or by specifying
`partial_columns:true` in the HTTP Header. When enabled, if a record with the
given primary key exists, only the specified fields are updated; if the primary
key does not exist, the missing fields are filled with default values.
We will provide detailed explanations of these two update methods in the
documentation: [Update in Unique Key Model](../update/unique-update) and [Load
Update in Unique Key Model](../update/update-of-unique-model).
### Update Transactions in Unique Key Model
-Whether you use the update statement or the batch update based on load, there
may be multiple update statements or load jobs in progress. In such cases, it
is important to ensure the effectiveness of multiple updates, maintain
atomicity, and prevent data inconsistency. This is where update transactions in
the unique key model come into play.
+#### Updating Data Using the `UPDATE` Statement
+
+By default, Doris does not allow multiple `UPDATE` operations on the same
table to occur concurrently. The `UPDATE` statement uses table-level locking to
ensure transactional consistency.
+
+Users can adjust concurrency limits by modifying the FE configuration
`enable_concurrent_update=true`. When concurrency limits are relaxed, the
`UPDATE` statement will no longer provide transactional guarantees.
+
+#### Batch Updates Based on Load
+
+Doris provides atomicity for all load update operations—each data load will
either be fully applied or fully rolled back.
+
+For concurrent load updates, Doris determines the order of concurrent updates
using an internal version control system (assigned based on the order of
completed loading), using an MVCC mechanism.
+
+Since the commit order of multiple concurrent load updates may be
unpredictable, if these concurrent load jobs involve updates to the same
primary key, the order in which they take effect is also uncertain. As a
result, the final visible outcome may be indeterminate. To address this issue,
Doris provides a `sequence` column mechanism, allowing users to specify a
version for each row in concurrent load updates, thus ensuring determinism in
the outcome of concurrent updates.
-The documentation on update transactions in the unique key model will cover
these aspects. In this document, we will focus on how to control the
effectiveness of updates by introducing the hidden column
__**DORIS_SEQUENCE_COL__, allowing developers to coordinate and achieve better
update transactions.
+For more detailed information on transaction mechanisms, refer to the
documentation on [Transactional Updates in the Primary Key
Model](../update/unique-update-transaction.md).
## Update in Aggregate Model
@@ -82,4 +96,24 @@ The update in the aggregate model refers to the process of
generating new aggreg
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
-The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
\ No newline at end of file
+The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
+
+## Comparison of Update Capabilities for Different Models/Implementations
+
+### Performance Comparison
+| | Unique Key MoW
| Unique Key MoR
[...]
+|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------
[...]
+| Import Speed | Deduplication is performed during import. Small-batch
real-time writes incur approximately 10%-20% performance loss compared to MoR,
while large-batch imports (e.g., tens or hundreds of millions of records) have
about 30%-50% performance loss compared to MoR.
| Similar to Duplicate Key
[...]
+| Query Speed | Similar to Duplicate Key
| Requires deduplication during queries, with query time approximately 3-10
times that of M [...]
+| Predicate Pushdown | Supported
| Not Supported
[...]
+| Resource Consumption | - **Import Resource Consumption**: Consumes
approximately 10%-30% more CPU compared to Duplicate Key/Unique Key MoR.<br />
- **Query Resource Consumption**: Similar to Duplicate Key with no additional
resource consumption.<br /> - **Compaction Resource Consumption**: Higher
memory and CPU usage compared to Duplicate Key, specific usage depends on data
characteristics and volume. | - **Import Resource Consumption**: Similar to
Duplicate Key with no additional reso [...]
+
+### Feature Support Comparison
+| | Unique Key MoW | Unique Key MoR | Aggregate Key
|
+|----------------|----------------|----------------|------------------------------------|
+| UPDATE | Supported | Supported | Not Supported
|
+| DELETE | Supported | Supported | Not Supported
|
+| sequence column| Supported | Supported | Not Supported
|
+| delete_sign | Supported | Supported | Not Supported
|
+| Partial Column Updates | Supported | Not Supported | Supported(can't update
null value) |
+| Inverted Index | Supported | Not Supported | Not Supported
|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-overview.md
index 6cfa9ca7de..f7cc267b6d 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-overview.md
@@ -58,27 +58,37 @@ PROPERTIES (
### 主键模型的两种更新方式
-- 使用 Update 语句更新
+#### 使用`UPDATE`语句更新
-无论是 MoR 还是 MoW,语义都是完成对指定列的更新。这个适合少量数据,不频繁的更新。
+无论是 MoR 还是 MoW,语义都是完成对指定列的更新。单次UPDATE的耗时会随着被更新的数据量的增加而增长。
-- 基于导入的批量更新
+#### 基于导入的批量更新
-Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导入方式,对于主键表,所有的导入都是“UPSERT”的语义,即如果相同 Key 的行不存在,则插入。对于已经存在的记录,则进行更新。
+Doris 支持多种数据导入方式,包括 Stream Load、Broker Load、Routine Load 以及 Insert Into
等。对于主键表,所有导入操作默认采用“UPSERT”语义:当相同主键的记录不存在时执行插入操作,若记录已存在则进行更新操作。更新方式包括整行更新和部分列更新:
-- 如果更新的是所有列,MoR 和 MoW 的语义是一样的,都是覆盖相同 Key 的所有 Value 列。
+- **整行更新**:Unique Key
表的更新默认为整行更新。在导入数据时,用户可以选择提供所有字段,或仅提供部分字段。当用户只提供部分字段时,Doris
会用默认值填充缺失的字段,生成完整记录并进行更新。
-- 如果更新的是部分列,MoR 和 MoW 的默认语义是一样的,即使用表 Schema 中缺失列的默认值作为缺失列的值,去覆盖旧的记录。
-
-- 如果更新的是部分列,主键模型采用的是 MoW,并且设置了 MySQL Session 变量 partial_columns = true 或者 HTTP
Header partial_columns:true,则被更新的缺失列的值,不是再使用表 Schema 中缺失列的默认值,而是已经存在记录的对应缺失列的值。
+- **部分列更新**:Unique Key MoW 支持部分列更新。用户可以通过设置会话变量
`enable_unique_key_partial_update = true` 或在 HTTP Header 中指定
`partial_columns:true` 来启用此功能。开启后,若导入数据的主键已存在,则仅更新指定的部分字段;若主键不存在,则使用默认值填充缺失字段。
我们会分别在文档 [主键模型的 Update 更新](../update/unique-update) 和
[主键模型的导入更新](../update/update-of-unique-model) 详细介绍两种更新方式。
### 主键模型的更新事务
-无论是使用 Update
语句更新,还是基于导入的批量更新,都可能有多个更新语句或者导入作业在进行,那么多个更新如何生效,如何确保更新的原子性,如何防止数据的不一致,这就是主键模型的更新事务。
+#### 使用`UPDATE`语句更新数据
+
+默认情况下,Doris 不允许在同一时间对同一张表进行多个`UPDATE`操作。`UPDATE`语句通过表级锁来确保事务的一致性。
+
+用户可以通过修改 FE
配置`enable_concurrent_update=true`来调整并发限制。当放宽并发限制时,`UPDATE`语句将不再提供事务保证。
+
+#### 基于导入的批量更新
-主键模型的更新事务
文档会介绍这块内容。在这篇文档中,我们会重点介绍通过引入隐藏列__**DORIS_SEQUENCE_COL__,**如何实现让开发者自己控制哪一个更新生效,这样通过与开发者协同,可以实现更好的更新事务。
+Doris 对所有导入更新操作提供原子性保障,即每次导入数据要么全部成功应用,要么全部失败回滚。
+
+对于并发导入更新,Doris 基于系统内部版本控制(按照导入完成提交的顺序进行分配),使用 MVCC 机制确定并发更新的顺序。
+
+由于多个并发导入更新的提交顺序可能无法预期,若这些并发导入涉及相同主键的更新,则其生效顺序也无法预知,最终的可见结果会因此存在不确定性。为解决此问题,Doris
提供了 sequence 列机制,允许用户在并发导入更新时为每一行数据指定版本,以便明确控制并发更新的结果顺序,实现确定性。
+
+我们将在文档[主键模型的更新事务](../update/unique-update-transaction.md) 中对事务机制进行详细介绍
## 聚合(Aggregate)模型的更新
@@ -87,3 +97,23 @@ Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
聚合模型只支持基于导入方式的更新,不支持使用 Update 语句更新。在定义聚合模型表的时候,如果把 value 列的聚合函数定义为
REPLACE_IF_NULL,也可以间接实现类似主键表的部分列更新能力。更多内容,请查看
[聚合模型的导入更新](../update/update-of-aggregate-model)。
+
+## 不同模型/实现的更新能力对比
+
+### 性能对比
+| | Unique Key MoW
| Unique Key MoR
| Aggregate Key |
+|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------|
+| 导入速度 |
导入过程中进行数据去重,小批量实时写入相比MoR约有10%-20%的性能损失,大批量导入(例如千万级/亿级数据)相比MoR约有30%-50%的性能损失
| 与Duplicate Key接近
| 与Duplicate Key接近 |
+| 查询速度 | 与Duplicate Key接近
| 需要在查询期间进行去重,查询耗时约为 MoW 的3-10倍
|
如果聚合函数为REPLACE/REPLACE_IF_NOT_NULL,查询速度与MoR接近 |
+| 谓词下推 | 支持
| 不支持
| 不支持 |
+| 资源消耗 | - **导入资源消耗**:相比Duplicate Key/Unique Key
MoR,约额外消耗约10%-30%的CPU。<br /> - **查询资源消耗**:与Duplicate Key接近,无额外资源消耗。<br /> -
**Compaction资源消耗**:相比Duplicate Key,消耗更多内存和CPU,具体取决于数据特征和数据量。 | -
**导入资源消耗**:与Duplicate Key相近,无额外资源消耗。<br /> - **查询资源消耗**:相比Duplicate Key/Unique
Key MoW,查询时额外消耗更多的CPU和内存。<br /> - **Compaction资源消耗**:相比Duplicate
Key,需更多内存和CPU,具体数值取决于数据特征和数据量。 | 与Unique Key MoR相同 |
+
+### 功能支持对比
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|----------------|
+| UPDATE |支持|支持| 不支持 |
+| DELETE |支持|支持| 不支持 |
+| sequence列 |支持|支持| 不支持 |
+| delete_sign |支持|支持| 不支持 |
+| 部分列更新 |支持|不支持| 支持(但无法更新null值) |
+| 倒排索引 |支持|不支持| 不支持 |
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/update-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/update-overview.md
index 6cfa9ca7de..7a21d4ffce 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/update-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/update-overview.md
@@ -58,27 +58,37 @@ PROPERTIES (
### 主键模型的两种更新方式
-- 使用 Update 语句更新
+#### 使用`UPDATE`语句更新
-无论是 MoR 还是 MoW,语义都是完成对指定列的更新。这个适合少量数据,不频繁的更新。
+无论是 MoR 还是 MoW,语义都是完成对指定列的更新。单次UPDATE的耗时会随着被更新的数据量的增加而增长。
-- 基于导入的批量更新
+#### 基于导入的批量更新
-Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导入方式,对于主键表,所有的导入都是“UPSERT”的语义,即如果相同 Key 的行不存在,则插入。对于已经存在的记录,则进行更新。
+Doris 支持多种数据导入方式,包括 Stream Load、Broker Load、Routine Load 以及 Insert Into
等。对于主键表,所有导入操作默认采用“UPSERT”语义:当相同主键的记录不存在时执行插入操作,若记录已存在则进行更新操作。更新方式包括整行更新和部分列更新:
-- 如果更新的是所有列,MoR 和 MoW 的语义是一样的,都是覆盖相同 Key 的所有 Value 列。
+- **整行更新**:Unique Key
表的更新默认为整行更新。在导入数据时,用户可以选择提供所有字段,或仅提供部分字段。当用户只提供部分字段时,Doris
会用默认值填充缺失的字段,生成完整记录并进行更新。
-- 如果更新的是部分列,MoR 和 MoW 的默认语义是一样的,即使用表 Schema 中缺失列的默认值作为缺失列的值,去覆盖旧的记录。
-
-- 如果更新的是部分列,主键模型采用的是 MoW,并且设置了 MySQL Session 变量 partial_columns = true 或者 HTTP
Header partial_columns:true,则被更新的缺失列的值,不是再使用表 Schema 中缺失列的默认值,而是已经存在记录的对应缺失列的值。
+- **部分列更新**:Unique Key MoW 支持部分列更新。用户可以通过设置会话变量
`enable_unique_key_partial_update = true` 或在 HTTP Header 中指定
`partial_columns:true` 来启用此功能。开启后,若导入数据的主键已存在,则仅更新指定的部分字段;若主键不存在,则使用默认值填充缺失字段。
我们会分别在文档 [主键模型的 Update 更新](../update/unique-update) 和
[主键模型的导入更新](../update/update-of-unique-model) 详细介绍两种更新方式。
### 主键模型的更新事务
-无论是使用 Update
语句更新,还是基于导入的批量更新,都可能有多个更新语句或者导入作业在进行,那么多个更新如何生效,如何确保更新的原子性,如何防止数据的不一致,这就是主键模型的更新事务。
+#### 使用`UPDATE`语句更新数据
+
+默认情况下,Doris 不允许在同一时间对同一张表进行多个`UPDATE`操作。`UPDATE`语句通过表级锁来确保事务的一致性。
+
+用户可以通过修改 FE
配置`enable_concurrent_update=true`来调整并发限制。当放宽并发限制时,`UPDATE`语句将不再提供事务保证。
+
+#### 基于导入的批量更新
-主键模型的更新事务
文档会介绍这块内容。在这篇文档中,我们会重点介绍通过引入隐藏列__**DORIS_SEQUENCE_COL__,**如何实现让开发者自己控制哪一个更新生效,这样通过与开发者协同,可以实现更好的更新事务。
+Doris 对所有导入更新操作提供原子性保障,即每次导入数据要么全部成功应用,要么全部失败回滚。
+
+对于并发导入更新,Doris 基于系统内部版本控制(按照导入完成提交的顺序进行分配),使用 MVCC 机制确定并发更新的顺序。
+
+由于多个并发导入更新的提交顺序可能无法预期,若这些并发导入涉及相同主键的更新,则其生效顺序也无法预知,最终的可见结果会因此存在不确定性。为解决此问题,Doris
提供了 sequence 列机制,允许用户在并发导入更新时为每一行数据指定版本,以便明确控制并发更新的结果顺序,实现确定性。
+
+我们将在文档[主键模型的更新事务](../update/unique-update-transaction.md) 中对事务机制进行详细介绍
## 聚合(Aggregate)模型的更新
@@ -87,3 +97,23 @@ Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
聚合模型只支持基于导入方式的更新,不支持使用 Update 语句更新。在定义聚合模型表的时候,如果把 value 列的聚合函数定义为
REPLACE_IF_NULL,也可以间接实现类似主键表的部分列更新能力。更多内容,请查看
[聚合模型的导入更新](../update/update-of-aggregate-model)。
+
+## 不同模型/实现的更新能力对比
+
+### 性能对比
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| 导入速度 |
导入过程中进行数据去重,小批量实时写入相比MoR约有10%-20%的性能损失,大批量导入(例如千万级/亿级数据)相比MoR约有30%-50%的性能损失
| 与Duplicate Key接近 | 与Duplicate Key接近 |
+| 查询速度 | 与Duplicate Key接近
| 需要在查询期间进行去重,查询耗时约为 MoW 的3-10倍 |
如果聚合函数为REPLACE/REPLACE_IF_NOT_NULL,查询速度与MoR接近 |
+| 谓词下推 | 支持
| 不支持 | 不支持 |
+| 资源消耗 | - **导入资源消耗**:相比Duplicate Key/Unique Key
MoR,约额外消耗约10%-30%的CPU。<br /> - **查询资源消耗**:与Duplicate Key接近,无额外资源消耗。<br /> -
**Compaction资源消耗**:相比Duplicate Key,消耗更多内存和CPU,具体取决于数据特征和数据量。 | -
**导入资源消耗**:与Duplicate Key相近,无额外资源消耗。<br /> - **查询资源消耗**:相比Duplicate Key/Unique
Key MoW,查询时额外消耗更多的CPU和内存。<br /> - **Compaction资源消耗**:相比Duplicate
Key,需更多内存和CPU,具体数值取决于数据特征和数据量。 | 与Unique Key MoR相同 |
+
+### 功能支持对比
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE |支持|支持|不支持|
+| DELETE |支持|支持|不支持|
+| sequence列 |支持|支持|不支持|
+| delete_sign |支持|支持|不支持|
+| 部分列更新 |支持|不支持| 支持(但无法更新null值) |
+| 倒排索引 |支持|不支持|不支持|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-overview.md
index 6cfa9ca7de..7a21d4ffce 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/update-overview.md
@@ -58,27 +58,37 @@ PROPERTIES (
### 主键模型的两种更新方式
-- 使用 Update 语句更新
+#### 使用`UPDATE`语句更新
-无论是 MoR 还是 MoW,语义都是完成对指定列的更新。这个适合少量数据,不频繁的更新。
+无论是 MoR 还是 MoW,语义都是完成对指定列的更新。单次UPDATE的耗时会随着被更新的数据量的增加而增长。
-- 基于导入的批量更新
+#### 基于导入的批量更新
-Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导入方式,对于主键表,所有的导入都是“UPSERT”的语义,即如果相同 Key 的行不存在,则插入。对于已经存在的记录,则进行更新。
+Doris 支持多种数据导入方式,包括 Stream Load、Broker Load、Routine Load 以及 Insert Into
等。对于主键表,所有导入操作默认采用“UPSERT”语义:当相同主键的记录不存在时执行插入操作,若记录已存在则进行更新操作。更新方式包括整行更新和部分列更新:
-- 如果更新的是所有列,MoR 和 MoW 的语义是一样的,都是覆盖相同 Key 的所有 Value 列。
+- **整行更新**:Unique Key
表的更新默认为整行更新。在导入数据时,用户可以选择提供所有字段,或仅提供部分字段。当用户只提供部分字段时,Doris
会用默认值填充缺失的字段,生成完整记录并进行更新。
-- 如果更新的是部分列,MoR 和 MoW 的默认语义是一样的,即使用表 Schema 中缺失列的默认值作为缺失列的值,去覆盖旧的记录。
-
-- 如果更新的是部分列,主键模型采用的是 MoW,并且设置了 MySQL Session 变量 partial_columns = true 或者 HTTP
Header partial_columns:true,则被更新的缺失列的值,不是再使用表 Schema 中缺失列的默认值,而是已经存在记录的对应缺失列的值。
+- **部分列更新**:Unique Key MoW 支持部分列更新。用户可以通过设置会话变量
`enable_unique_key_partial_update = true` 或在 HTTP Header 中指定
`partial_columns:true` 来启用此功能。开启后,若导入数据的主键已存在,则仅更新指定的部分字段;若主键不存在,则使用默认值填充缺失字段。
我们会分别在文档 [主键模型的 Update 更新](../update/unique-update) 和
[主键模型的导入更新](../update/update-of-unique-model) 详细介绍两种更新方式。
### 主键模型的更新事务
-无论是使用 Update
语句更新,还是基于导入的批量更新,都可能有多个更新语句或者导入作业在进行,那么多个更新如何生效,如何确保更新的原子性,如何防止数据的不一致,这就是主键模型的更新事务。
+#### 使用`UPDATE`语句更新数据
+
+默认情况下,Doris 不允许在同一时间对同一张表进行多个`UPDATE`操作。`UPDATE`语句通过表级锁来确保事务的一致性。
+
+用户可以通过修改 FE
配置`enable_concurrent_update=true`来调整并发限制。当放宽并发限制时,`UPDATE`语句将不再提供事务保证。
+
+#### 基于导入的批量更新
-主键模型的更新事务
文档会介绍这块内容。在这篇文档中,我们会重点介绍通过引入隐藏列__**DORIS_SEQUENCE_COL__,**如何实现让开发者自己控制哪一个更新生效,这样通过与开发者协同,可以实现更好的更新事务。
+Doris 对所有导入更新操作提供原子性保障,即每次导入数据要么全部成功应用,要么全部失败回滚。
+
+对于并发导入更新,Doris 基于系统内部版本控制(按照导入完成提交的顺序进行分配),使用 MVCC 机制确定并发更新的顺序。
+
+由于多个并发导入更新的提交顺序可能无法预期,若这些并发导入涉及相同主键的更新,则其生效顺序也无法预知,最终的可见结果会因此存在不确定性。为解决此问题,Doris
提供了 sequence 列机制,允许用户在并发导入更新时为每一行数据指定版本,以便明确控制并发更新的结果顺序,实现确定性。
+
+我们将在文档[主键模型的更新事务](../update/unique-update-transaction.md) 中对事务机制进行详细介绍
## 聚合(Aggregate)模型的更新
@@ -87,3 +97,23 @@ Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
聚合模型只支持基于导入方式的更新,不支持使用 Update 语句更新。在定义聚合模型表的时候,如果把 value 列的聚合函数定义为
REPLACE_IF_NULL,也可以间接实现类似主键表的部分列更新能力。更多内容,请查看
[聚合模型的导入更新](../update/update-of-aggregate-model)。
+
+## 不同模型/实现的更新能力对比
+
+### 性能对比
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| 导入速度 |
导入过程中进行数据去重,小批量实时写入相比MoR约有10%-20%的性能损失,大批量导入(例如千万级/亿级数据)相比MoR约有30%-50%的性能损失
| 与Duplicate Key接近 | 与Duplicate Key接近 |
+| 查询速度 | 与Duplicate Key接近
| 需要在查询期间进行去重,查询耗时约为 MoW 的3-10倍 |
如果聚合函数为REPLACE/REPLACE_IF_NOT_NULL,查询速度与MoR接近 |
+| 谓词下推 | 支持
| 不支持 | 不支持 |
+| 资源消耗 | - **导入资源消耗**:相比Duplicate Key/Unique Key
MoR,约额外消耗约10%-30%的CPU。<br /> - **查询资源消耗**:与Duplicate Key接近,无额外资源消耗。<br /> -
**Compaction资源消耗**:相比Duplicate Key,消耗更多内存和CPU,具体取决于数据特征和数据量。 | -
**导入资源消耗**:与Duplicate Key相近,无额外资源消耗。<br /> - **查询资源消耗**:相比Duplicate Key/Unique
Key MoW,查询时额外消耗更多的CPU和内存。<br /> - **Compaction资源消耗**:相比Duplicate
Key,需更多内存和CPU,具体数值取决于数据特征和数据量。 | 与Unique Key MoR相同 |
+
+### 功能支持对比
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE |支持|支持|不支持|
+| DELETE |支持|支持|不支持|
+| sequence列 |支持|支持|不支持|
+| delete_sign |支持|支持|不支持|
+| 部分列更新 |支持|不支持| 支持(但无法更新null值) |
+| 倒排索引 |支持|不支持|不支持|
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-overview.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-overview.md
index 6cfa9ca7de..9591eba331 100644
---
a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-overview.md
+++
b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/update-overview.md
@@ -58,27 +58,37 @@ PROPERTIES (
### 主键模型的两种更新方式
-- 使用 Update 语句更新
+#### 使用`UPDATE`语句更新
无论是 MoR 还是 MoW,语义都是完成对指定列的更新。这个适合少量数据,不频繁的更新。
-- 基于导入的批量更新
+#### 基于导入的批量更新
-Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导入方式,对于主键表,所有的导入都是“UPSERT”的语义,即如果相同 Key 的行不存在,则插入。对于已经存在的记录,则进行更新。
+Doris 支持多种数据导入方式,包括 Stream Load、Broker Load、Routine Load 以及 Insert Into
等。对于主键表,所有导入操作默认采用“UPSERT”语义:当相同主键的记录不存在时执行插入操作,若记录已存在则进行更新操作。更新方式包括整行更新和部分列更新:
-- 如果更新的是所有列,MoR 和 MoW 的语义是一样的,都是覆盖相同 Key 的所有 Value 列。
+- **整行更新**:Unique Key
表的更新默认为整行更新。在导入数据时,用户可以选择提供所有字段,或仅提供部分字段。当用户只提供部分字段时,Doris
会用默认值填充缺失的字段,生成完整记录并进行更新。
-- 如果更新的是部分列,MoR 和 MoW 的默认语义是一样的,即使用表 Schema 中缺失列的默认值作为缺失列的值,去覆盖旧的记录。
-
-- 如果更新的是部分列,主键模型采用的是 MoW,并且设置了 MySQL Session 变量 partial_columns = true 或者 HTTP
Header partial_columns:true,则被更新的缺失列的值,不是再使用表 Schema 中缺失列的默认值,而是已经存在记录的对应缺失列的值。
+- **部分列更新**:Unique Key MoW 支持部分列更新。用户可以通过设置会话变量
`enable_unique_key_partial_update = true` 或在 HTTP Header 中指定
`partial_columns:true` 来启用此功能。开启后,若导入数据的主键已存在,则仅更新指定的部分字段;若主键不存在,则使用默认值填充缺失字段。
我们会分别在文档 [主键模型的 Update 更新](../update/unique-update) 和
[主键模型的导入更新](../update/update-of-unique-model) 详细介绍两种更新方式。
### 主键模型的更新事务
-无论是使用 Update
语句更新,还是基于导入的批量更新,都可能有多个更新语句或者导入作业在进行,那么多个更新如何生效,如何确保更新的原子性,如何防止数据的不一致,这就是主键模型的更新事务。
+#### 使用`UPDATE`语句更新数据
+
+默认情况下,Doris 不允许在同一时间对同一张表进行多个`UPDATE`操作。`UPDATE`语句通过表级锁来确保事务的一致性。
+
+用户可以通过修改 FE
配置`enable_concurrent_update=true`来调整并发限制。当放宽并发限制时,`UPDATE`语句将不再提供事务保证。
+
+#### 基于导入的批量更新
-主键模型的更新事务
文档会介绍这块内容。在这篇文档中,我们会重点介绍通过引入隐藏列__**DORIS_SEQUENCE_COL__,**如何实现让开发者自己控制哪一个更新生效,这样通过与开发者协同,可以实现更好的更新事务。
+Doris 对所有导入更新操作提供原子性保障,即每次导入数据要么全部成功应用,要么全部失败回滚。
+
+对于并发导入更新,Doris 基于系统内部版本控制(按照导入完成提交的顺序进行分配),使用 MVCC 机制确定并发更新的顺序。
+
+由于多个并发导入更新的提交顺序可能无法预期,若这些并发导入涉及相同主键的更新,则其生效顺序也无法预知,最终的可见结果会因此存在不确定性。为解决此问题,Doris
提供了 sequence 列机制,允许用户在并发导入更新时为每一行数据指定版本,以便明确控制并发更新的结果顺序,实现确定性。
+
+我们将在文档[主键模型的更新事务](../update/unique-update-transaction.md) 中对事务机制进行详细介绍
## 聚合(Aggregate)模型的更新
@@ -87,3 +97,23 @@ Doris 支持 Stream Load、Broker Load、Routine Load、Insert Into
等多种导
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
聚合模型只支持基于导入方式的更新,不支持使用 Update 语句更新。在定义聚合模型表的时候,如果把 value 列的聚合函数定义为
REPLACE_IF_NULL,也可以间接实现类似主键表的部分列更新能力。更多内容,请查看
[聚合模型的导入更新](../update/update-of-aggregate-model)。
+
+## 不同模型/实现的更新能力对比
+
+### 性能对比
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| 导入速度 |
导入过程中进行数据去重,小批量实时写入相比MoR约有10%-20%的性能损失,大批量导入(例如千万级/亿级数据)相比MoR约有30%-50%的性能损失
| 与Duplicate Key接近 | 与Duplicate Key接近 |
+| 查询速度 | 与Duplicate Key接近
| 需要在查询期间进行去重,查询耗时约为 MoW 的3-10倍 |
如果聚合函数为REPLACE/REPLACE_IF_NOT_NULL,查询速度与MoR接近 |
+| 谓词下推 | 支持
| 不支持 | 不支持 |
+| 资源消耗 | - **导入资源消耗**:相比Duplicate Key/Unique Key
MoR,约额外消耗约10%-30%的CPU。<br /> - **查询资源消耗**:与Duplicate Key接近,无额外资源消耗。<br /> -
**Compaction资源消耗**:相比Duplicate Key,消耗更多内存和CPU,具体取决于数据特征和数据量。 | -
**导入资源消耗**:与Duplicate Key相近,无额外资源消耗。<br /> - **查询资源消耗**:相比Duplicate Key/Unique
Key MoW,查询时额外消耗更多的CPU和内存。<br /> - **Compaction资源消耗**:相比Duplicate
Key,需更多内存和CPU,具体数值取决于数据特征和数据量。 | 与Unique Key MoR相同 |
+
+### 功能支持对比
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE |支持|支持|不支持|
+| DELETE |支持|支持|不支持|
+| sequence列 |支持|支持|不支持|
+| delete_sign |支持|支持|不支持|
+| 部分列更新 |支持|不支持| 支持(但无法更新null值) |
+| 倒排索引 |支持|不支持|不支持|
diff --git a/versioned_docs/version-2.0/data-operate/update/update-overview.md
b/versioned_docs/version-2.0/data-operate/update/update-overview.md
index 039a33569c..a6fd54f285 100644
--- a/versioned_docs/version-2.0/data-operate/update/update-overview.md
+++ b/versioned_docs/version-2.0/data-operate/update/update-overview.md
@@ -58,23 +58,37 @@ Starting from Doris 2.1, MoW is the default mode for the
unique key model. So, i
### Two Update Methods in Unique Key Model
-- Update statement: This method is used to update a specific column and is
suitable for infrequent updates with a small amount of data.
+#### `UPDATE` statement
-- Batch update based on load: Doris supports various load methods such as
Stream Load, Broker Load, Routine Load, and Insert Into. For unique key tables,
all load have the "UPSERT" semantics, meaning that if a row with the same key
does not exist, it will be inserted, and if it already exists, it will be
updated.
+Whether it is MoR or MoW, the semantics ensure that the specified columns are
updated. The time taken for a single UPDATE operation increases with the amount
of data being updated.
-- If all columns are updated, MoR and MoW have the same semantics, which is to
replace all value columns for the same key.
+#### Batch update based on load
-- If only some columns are updated, the default semantics for MoR and MoW are
the same. In this case, the missing columns in the table schema will be updated
with their default values, overwriting the old records.
+Doris supports multiple data load methods, including Stream Load, Broker Load,
Routine Load, and Insert Into. For primary key tables, all load operations
default to "UPSERT" semantics: if a record with the same key does not exist, it
is inserted; if it already exists, it is updated. There are two types of
updates: full row updates and partial column updates.
-- If only some columns are updated and MoW is used in the unique key model,
and the MySQL session variable "partial_columns" is set to true, or the HTTP
header "partial_columns" is set to true, the missing columns will be updated
with the corresponding missing column values from the existing record, instead
of using the default values from the table schema.
+- **Full Row Update**: Updates for Unique Key tables default to full row
updates. During data loading, users can choose to provide either all fields or
only part of them. If only partial fields are provided, Doris will fill in the
missing fields with default values to form a complete record for updating.
+
+- **Partial Column Update**: Unique Key MoW supports partial column updates.
Users can enable this feature by setting the session variable
`enable_unique_key_partial_update = true` or by specifying
`partial_columns:true` in the HTTP Header. When enabled, if a record with the
given primary key exists, only the specified fields are updated; if the primary
key does not exist, the missing fields are filled with default values.
We will provide detailed explanations of these two update methods in the
documentation: [Update in Unique Key Model](../update/unique-update) and [Load
Update in Unique Key Model](../update/update-of-unique-model).
### Update Transactions in Unique Key Model
-Whether you use the update statement or the batch update based on load, there
may be multiple update statements or load jobs in progress. In such cases, it
is important to ensure the effectiveness of multiple updates, maintain
atomicity, and prevent data inconsistency. This is where update transactions in
the unique key model come into play.
+#### Updating Data Using the `UPDATE` Statement
+
+By default, Doris does not allow multiple `UPDATE` operations on the same
table to occur concurrently. The `UPDATE` statement uses table-level locking to
ensure transactional consistency.
+
+Users can adjust concurrency limits by modifying the FE configuration
`enable_concurrent_update=true`. When concurrency limits are relaxed, the
`UPDATE` statement will no longer provide transactional guarantees.
+
+#### Batch Updates Based on Load
+
+Doris provides atomicity for all load update operations—each data load will
either be fully applied or fully rolled back.
+
+For concurrent load updates, Doris determines the order of concurrent updates
using an internal version control system (assigned based on the order of
completed loading), using an MVCC mechanism.
+
+Since the commit order of multiple concurrent load updates may be
unpredictable, if these concurrent load jobs involve updates to the same
primary key, the order in which they take effect is also uncertain. As a
result, the final visible outcome may be indeterminate. To address this issue,
Doris provides a `sequence` column mechanism, allowing users to specify a
version for each row in concurrent load updates, thus ensuring determinism in
the outcome of concurrent updates.
-The documentation on update transactions in the unique key model will cover
these aspects. In this document, we will focus on how to control the
effectiveness of updates by introducing the hidden column
__**DORIS_SEQUENCE_COL__, allowing developers to coordinate and achieve better
update transactions.
+For more detailed information on transaction mechanisms, refer to the
documentation on [Transactional Updates in the Primary Key
Model](../update/unique-update-transaction.md).
## Update in Aggregate Model
@@ -82,4 +96,24 @@ The update in the aggregate model refers to the process of
generating new aggreg
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
-The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
\ No newline at end of file
+The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
+
+## Comparison of Update Capabilities for Different Models/Implementations
+
+### Performance Comparison
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| Import Speed | Deduplication is performed during import. Small-batch
real-time writes incur approximately 10%-20% performance loss compared to MoR,
while large-batch imports (e.g., tens or hundreds of millions of records) have
about 30%-50% performance loss compared to MoR. | Similar to Duplicate Key |
Similar to Duplicate Key |
+| Query Speed | Similar to Duplicate Key
| Requires deduplication during queries, with
query time approximately 3-10 times that of MoW | If the aggregation function
is REPLACE/REPLACE_IF_NOT_NULL, query speed is similar to MoR |
+| Predicate Pushdown | Supported
| Not Supported | Not Supported |
+| Resource Consumption | - **Import Resource Consumption**: Consumes
approximately 10%-30% more CPU compared to Duplicate Key/Unique Key MoR.<br />
- **Query Resource Consumption**: Similar to Duplicate Key with no additional
resource consumption.<br /> - **Compaction Resource Consumption**: Higher
memory and CPU usage compared to Duplicate Key, specific usage depends on data
characteristics and volume. | - **Import Resource Consumption**: Similar to
Duplicate Key with no additional reso [...]
+
+### Feature Support Comparison
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE | Supported | Supported | Not Supported |
+| DELETE | Supported | Supported | Not Supported |
+| sequence column| Supported | Supported | Not Supported |
+| delete_sign | Supported | Supported | Not Supported |
+| Partial Column Updates | Supported | Not Supported | Supported(can't update
null value) |
+| Inverted Index | Supported | Not Supported | Not Supported |
diff --git a/versioned_docs/version-2.1/data-operate/update/update-overview.md
b/versioned_docs/version-2.1/data-operate/update/update-overview.md
index 039a33569c..a6fd54f285 100644
--- a/versioned_docs/version-2.1/data-operate/update/update-overview.md
+++ b/versioned_docs/version-2.1/data-operate/update/update-overview.md
@@ -58,23 +58,37 @@ Starting from Doris 2.1, MoW is the default mode for the
unique key model. So, i
### Two Update Methods in Unique Key Model
-- Update statement: This method is used to update a specific column and is
suitable for infrequent updates with a small amount of data.
+#### `UPDATE` statement
-- Batch update based on load: Doris supports various load methods such as
Stream Load, Broker Load, Routine Load, and Insert Into. For unique key tables,
all load have the "UPSERT" semantics, meaning that if a row with the same key
does not exist, it will be inserted, and if it already exists, it will be
updated.
+Whether it is MoR or MoW, the semantics ensure that the specified columns are
updated. The time taken for a single UPDATE operation increases with the amount
of data being updated.
-- If all columns are updated, MoR and MoW have the same semantics, which is to
replace all value columns for the same key.
+#### Batch update based on load
-- If only some columns are updated, the default semantics for MoR and MoW are
the same. In this case, the missing columns in the table schema will be updated
with their default values, overwriting the old records.
+Doris supports multiple data load methods, including Stream Load, Broker Load,
Routine Load, and Insert Into. For primary key tables, all load operations
default to "UPSERT" semantics: if a record with the same key does not exist, it
is inserted; if it already exists, it is updated. There are two types of
updates: full row updates and partial column updates.
-- If only some columns are updated and MoW is used in the unique key model,
and the MySQL session variable "partial_columns" is set to true, or the HTTP
header "partial_columns" is set to true, the missing columns will be updated
with the corresponding missing column values from the existing record, instead
of using the default values from the table schema.
+- **Full Row Update**: Updates for Unique Key tables default to full row
updates. During data loading, users can choose to provide either all fields or
only part of them. If only partial fields are provided, Doris will fill in the
missing fields with default values to form a complete record for updating.
+
+- **Partial Column Update**: Unique Key MoW supports partial column updates.
Users can enable this feature by setting the session variable
`enable_unique_key_partial_update = true` or by specifying
`partial_columns:true` in the HTTP Header. When enabled, if a record with the
given primary key exists, only the specified fields are updated; if the primary
key does not exist, the missing fields are filled with default values.
We will provide detailed explanations of these two update methods in the
documentation: [Update in Unique Key Model](../update/unique-update) and [Load
Update in Unique Key Model](../update/update-of-unique-model).
### Update Transactions in Unique Key Model
-Whether you use the update statement or the batch update based on load, there
may be multiple update statements or load jobs in progress. In such cases, it
is important to ensure the effectiveness of multiple updates, maintain
atomicity, and prevent data inconsistency. This is where update transactions in
the unique key model come into play.
+#### Updating Data Using the `UPDATE` Statement
+
+By default, Doris does not allow multiple `UPDATE` operations on the same
table to occur concurrently. The `UPDATE` statement uses table-level locking to
ensure transactional consistency.
+
+Users can adjust concurrency limits by modifying the FE configuration
`enable_concurrent_update=true`. When concurrency limits are relaxed, the
`UPDATE` statement will no longer provide transactional guarantees.
+
+#### Batch Updates Based on Load
+
+Doris provides atomicity for all load update operations—each data load will
either be fully applied or fully rolled back.
+
+For concurrent load updates, Doris determines the order of concurrent updates
using an internal version control system (assigned based on the order of
completed loading), using an MVCC mechanism.
+
+Since the commit order of multiple concurrent load updates may be
unpredictable, if these concurrent load jobs involve updates to the same
primary key, the order in which they take effect is also uncertain. As a
result, the final visible outcome may be indeterminate. To address this issue,
Doris provides a `sequence` column mechanism, allowing users to specify a
version for each row in concurrent load updates, thus ensuring determinism in
the outcome of concurrent updates.
-The documentation on update transactions in the unique key model will cover
these aspects. In this document, we will focus on how to control the
effectiveness of updates by introducing the hidden column
__**DORIS_SEQUENCE_COL__, allowing developers to coordinate and achieve better
update transactions.
+For more detailed information on transaction mechanisms, refer to the
documentation on [Transactional Updates in the Primary Key
Model](../update/unique-update-transaction.md).
## Update in Aggregate Model
@@ -82,4 +96,24 @@ The update in the aggregate model refers to the process of
generating new aggreg
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
-The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
\ No newline at end of file
+The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
+
+## Comparison of Update Capabilities for Different Models/Implementations
+
+### Performance Comparison
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| Import Speed | Deduplication is performed during import. Small-batch
real-time writes incur approximately 10%-20% performance loss compared to MoR,
while large-batch imports (e.g., tens or hundreds of millions of records) have
about 30%-50% performance loss compared to MoR. | Similar to Duplicate Key |
Similar to Duplicate Key |
+| Query Speed | Similar to Duplicate Key
| Requires deduplication during queries, with
query time approximately 3-10 times that of MoW | If the aggregation function
is REPLACE/REPLACE_IF_NOT_NULL, query speed is similar to MoR |
+| Predicate Pushdown | Supported
| Not Supported | Not Supported |
+| Resource Consumption | - **Import Resource Consumption**: Consumes
approximately 10%-30% more CPU compared to Duplicate Key/Unique Key MoR.<br />
- **Query Resource Consumption**: Similar to Duplicate Key with no additional
resource consumption.<br /> - **Compaction Resource Consumption**: Higher
memory and CPU usage compared to Duplicate Key, specific usage depends on data
characteristics and volume. | - **Import Resource Consumption**: Similar to
Duplicate Key with no additional reso [...]
+
+### Feature Support Comparison
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE | Supported | Supported | Not Supported |
+| DELETE | Supported | Supported | Not Supported |
+| sequence column| Supported | Supported | Not Supported |
+| delete_sign | Supported | Supported | Not Supported |
+| Partial Column Updates | Supported | Not Supported | Supported(can't update
null value) |
+| Inverted Index | Supported | Not Supported | Not Supported |
diff --git a/versioned_docs/version-3.0/data-operate/update/update-overview.md
b/versioned_docs/version-3.0/data-operate/update/update-overview.md
index 039a33569c..a6fd54f285 100644
--- a/versioned_docs/version-3.0/data-operate/update/update-overview.md
+++ b/versioned_docs/version-3.0/data-operate/update/update-overview.md
@@ -58,23 +58,37 @@ Starting from Doris 2.1, MoW is the default mode for the
unique key model. So, i
### Two Update Methods in Unique Key Model
-- Update statement: This method is used to update a specific column and is
suitable for infrequent updates with a small amount of data.
+#### `UPDATE` statement
-- Batch update based on load: Doris supports various load methods such as
Stream Load, Broker Load, Routine Load, and Insert Into. For unique key tables,
all load have the "UPSERT" semantics, meaning that if a row with the same key
does not exist, it will be inserted, and if it already exists, it will be
updated.
+Whether it is MoR or MoW, the semantics ensure that the specified columns are
updated. The time taken for a single UPDATE operation increases with the amount
of data being updated.
-- If all columns are updated, MoR and MoW have the same semantics, which is to
replace all value columns for the same key.
+#### Batch update based on load
-- If only some columns are updated, the default semantics for MoR and MoW are
the same. In this case, the missing columns in the table schema will be updated
with their default values, overwriting the old records.
+Doris supports multiple data load methods, including Stream Load, Broker Load,
Routine Load, and Insert Into. For primary key tables, all load operations
default to "UPSERT" semantics: if a record with the same key does not exist, it
is inserted; if it already exists, it is updated. There are two types of
updates: full row updates and partial column updates.
-- If only some columns are updated and MoW is used in the unique key model,
and the MySQL session variable "partial_columns" is set to true, or the HTTP
header "partial_columns" is set to true, the missing columns will be updated
with the corresponding missing column values from the existing record, instead
of using the default values from the table schema.
+- **Full Row Update**: Updates for Unique Key tables default to full row
updates. During data loading, users can choose to provide either all fields or
only part of them. If only partial fields are provided, Doris will fill in the
missing fields with default values to form a complete record for updating.
+
+- **Partial Column Update**: Unique Key MoW supports partial column updates.
Users can enable this feature by setting the session variable
`enable_unique_key_partial_update = true` or by specifying
`partial_columns:true` in the HTTP Header. When enabled, if a record with the
given primary key exists, only the specified fields are updated; if the primary
key does not exist, the missing fields are filled with default values.
We will provide detailed explanations of these two update methods in the
documentation: [Update in Unique Key Model](../update/unique-update) and [Load
Update in Unique Key Model](../update/update-of-unique-model).
### Update Transactions in Unique Key Model
-Whether you use the update statement or the batch update based on load, there
may be multiple update statements or load jobs in progress. In such cases, it
is important to ensure the effectiveness of multiple updates, maintain
atomicity, and prevent data inconsistency. This is where update transactions in
the unique key model come into play.
+#### Updating Data Using the `UPDATE` Statement
+
+By default, Doris does not allow multiple `UPDATE` operations on the same
table to occur concurrently. The `UPDATE` statement uses table-level locking to
ensure transactional consistency.
+
+Users can adjust concurrency limits by modifying the FE configuration
`enable_concurrent_update=true`. When concurrency limits are relaxed, the
`UPDATE` statement will no longer provide transactional guarantees.
+
+#### Batch Updates Based on Load
+
+Doris provides atomicity for all load update operations—each data load will
either be fully applied or fully rolled back.
+
+For concurrent load updates, Doris determines the order of concurrent updates
using an internal version control system (assigned based on the order of
completed loading), using an MVCC mechanism.
+
+Since the commit order of multiple concurrent load updates may be
unpredictable, if these concurrent load jobs involve updates to the same
primary key, the order in which they take effect is also uncertain. As a
result, the final visible outcome may be indeterminate. To address this issue,
Doris provides a `sequence` column mechanism, allowing users to specify a
version for each row in concurrent load updates, thus ensuring determinism in
the outcome of concurrent updates.
-The documentation on update transactions in the unique key model will cover
these aspects. In this document, we will focus on how to control the
effectiveness of updates by introducing the hidden column
__**DORIS_SEQUENCE_COL__, allowing developers to coordinate and achieve better
update transactions.
+For more detailed information on transaction mechanisms, refer to the
documentation on [Transactional Updates in the Primary Key
Model](../update/unique-update-transaction.md).
## Update in Aggregate Model
@@ -82,4 +96,24 @@ The update in the aggregate model refers to the process of
generating new aggreg
New Agg Value = Agg Func ( Old Agg Value, New Column Value)
-The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
\ No newline at end of file
+The update in the aggregate model is only supported through load methods and
does not support the use of Update statements. When defining a table in the
aggregate model, if the aggregation function for the value column is defined as
REPLACE_IF_NULL, it indirectly achieves partial column update capabilities
similar to the unique key model. For more details, please refer to the
documentation on [Load Update in the Aggregate
Model](../update/update-of-aggregate-model).
+
+## Comparison of Update Capabilities for Different Models/Implementations
+
+### Performance Comparison
+| | Unique Key MoW
| Unique Key MoR | Aggregate Key |
+|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|---------------|
+| Import Speed | Deduplication is performed during import. Small-batch
real-time writes incur approximately 10%-20% performance loss compared to MoR,
while large-batch imports (e.g., tens or hundreds of millions of records) have
about 30%-50% performance loss compared to MoR. | Similar to Duplicate Key |
Similar to Duplicate Key |
+| Query Speed | Similar to Duplicate Key
| Requires deduplication during queries, with
query time approximately 3-10 times that of MoW | If the aggregation function
is REPLACE/REPLACE_IF_NOT_NULL, query speed is similar to MoR |
+| Predicate Pushdown | Supported
| Not Supported | Not Supported |
+| Resource Consumption | - **Import Resource Consumption**: Consumes
approximately 10%-30% more CPU compared to Duplicate Key/Unique Key MoR.<br />
- **Query Resource Consumption**: Similar to Duplicate Key with no additional
resource consumption.<br /> - **Compaction Resource Consumption**: Higher
memory and CPU usage compared to Duplicate Key, specific usage depends on data
characteristics and volume. | - **Import Resource Consumption**: Similar to
Duplicate Key with no additional reso [...]
+
+### Feature Support Comparison
+| | Unique Key MoW | Unique Key MoR | Aggregate Key |
+|----------------|----------------|----------------|---------------|
+| UPDATE | Supported | Supported | Not Supported |
+| DELETE | Supported | Supported | Not Supported |
+| sequence column| Supported | Supported | Not Supported |
+| delete_sign | Supported | Supported | Not Supported |
+| Partial Column Updates | Supported | Not Supported | Supported(can't update
null value) |
+| Inverted Index | Supported | Not Supported | Not Supported |
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]