This is an automated email from the ASF dual-hosted git repository.
luzhijing pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 17164cf7a8 [fix](docs) add logic for batch delete when sequence column
exists (#17367)
17164cf7a8 is described below
commit 17164cf7a86995959a9168a6b0ca1c69fd9fba8f
Author: xueweizhang <[email protected]>
AuthorDate: Fri Mar 3 16:28:31 2023 +0800
[fix](docs) add logic for batch delete when sequence column exists (#17367)
* [fix](docs) add logic for batch delete when sequence column exists.
Signed-off-by: nextdreamblue <[email protected]>
* add docs
Signed-off-by: nextdreamblue <[email protected]>
* fix docs 2
Signed-off-by: nextdreamblue <[email protected]>
---------
Signed-off-by: nextdreamblue <[email protected]>
---
.../update-delete/batch-delete-manual.md | 83 ++++++++++++++++++++-
.../update-delete/batch-delete-manual.md | 84 +++++++++++++++++++++-
2 files changed, 165 insertions(+), 2 deletions(-)
diff --git a/docs/en/docs/data-operate/update-delete/batch-delete-manual.md
b/docs/en/docs/data-operate/update-delete/batch-delete-manual.md
index 1a72eadcbc..cca98ad504 100644
--- a/docs/en/docs/data-operate/update-delete/batch-delete-manual.md
+++ b/docs/en/docs/data-operate/update-delete/batch-delete-manual.md
@@ -30,7 +30,7 @@ Currently, Doris supports multiple import methods such as
[broker load](../impor
For scenarios similar to the import of cdc data, insert and delete in the data
data generally appear interspersed. In this scenario, our current import method
is not enough, even if we can separate insert and delete, it can solve the
import problem , But still cannot solve the problem of deletion. Use the batch
delete function to solve the needs of these scenarios.
There are three ways to merge data import:
1. APPEND: All data are appended to existing data
-2. DELETE: delete all rows with the same key column value as the imported data
+2. DELETE: delete all rows with the same key column value as the imported
data(When there is a [`sequence`](sequence-column-manual.md) column in the
table, the same primary key and the logic of the size of the sequence column
must be satisfied at the same time to delete it correctly, see use case 4 below
for details.)
3. MERGE: APPEND or DELETE according to DELETE ON decision
## Fundamental
@@ -224,3 +224,84 @@ After load:
+--------+----------+----------+------+
```
+4. When the table has the sequence column, delete all data with the same key
as the imported data
+
+```bash
+curl --location-trusted -u root: -H "column_separator:," -H "columns: name,
gender, age" -H "function_column.sequence_col: age" -H "merge_type: DELETE" -T
~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+```
+
+When the unique table has the sequence column, sequence column is used as the
basis for the replacement order of the REPLACE aggregate function under the
same key column, and the larger value can replace the smaller value.
+If you want delete some data, the imported data must have the same key and the
sequence column must be larger or equal than before.
+
+for example, one table like this:
+```sql
+mysql> SET show_hidden_columns=true;
+Query OK, 0 rows affected (0.00 sec)
+
+mysql> DESC table1;
++------------------------+--------------+------+-------+---------+---------+
+| Field | Type | Null | Key | Default | Extra |
++------------------------+--------------+------+-------+---------+---------+
+| name | VARCHAR(100) | No | true | NULL | |
+| gender | VARCHAR(10) | Yes | false | NULL | REPLACE |
+| age | INT | Yes | false | NULL | REPLACE |
+| __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |
+| __DORIS_SEQUENCE_COL__ | INT | Yes | false | NULL | REPLACE |
++------------------------+--------------+------+-------+---------+---------+
+4 rows in set (0.00 sec)
+```
+
+Before load:
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| li | male | 10 |
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+
+If you load data like this:
+```text
+li,male,10
+```
+
+After load:
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+
+You will find that the data is deleted.
+```text
+li,male,10
+```
+
+But if you load data like this:
+```text
+li,male,9
+```
+
+After load:
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| li | male | 10 |
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+
+You will find that the data is not deleted.
+```text
+li,male,10
+```
+This is because in the underlying dependencies, it will first judge the case
of the same key, display the row data with a large value in the sequence
column, and then check whether the `__DORIS_DELETE_SIGN__` value of the row is
1. If it is 1, it will not be displayed. If it is 0, it will still be read out.
+
+**When data is written and deleted at the same time in the imported data (such
as in the Flink CDC scenario), using the sequence column can effectively ensure
the consistency when the data arrives out of order, avoiding the deletion
operation of an old version that arrives later, and accidentally deleting The
new version of the data that arrives first.**
\ No newline at end of file
diff --git a/docs/zh-CN/docs/data-operate/update-delete/batch-delete-manual.md
b/docs/zh-CN/docs/data-operate/update-delete/batch-delete-manual.md
index efa68c33be..a41ba07886 100644
--- a/docs/zh-CN/docs/data-operate/update-delete/batch-delete-manual.md
+++ b/docs/zh-CN/docs/data-operate/update-delete/batch-delete-manual.md
@@ -31,7 +31,7 @@ under the License.
对于类似于cdc数据导入的场景,数据中insert和delete一般是穿插出现的,面对这种场景我们目前的导入方式也无法满足,即使我们能够分离出insert和delete虽然可以解决导入的问题,但是仍然解决不了删除的问题。使用批量删除功能可以解决这些个别场景的需求。数据导入有三种合并方式:
1. APPEND: 数据全部追加到现有数据中;
-2. DELETE: 删除所有与导入数据key 列值相同的行;
+2. DELETE: 删除所有与导入数据key
列值相同的行(当表存在[`sequence`](sequence-column-manual.md)列时,需要同时满足主键相同以及sequence列的大小逻辑才能正确删除,详见下边用例4);
3. MERGE: 根据 DELETE ON 的决定 APPEND 还是 DELETE。
## 基本原理
@@ -243,3 +243,85 @@ curl --location-trusted -u root: -H "column_separator:,"
-H "columns: siteid, ci
+--------+----------+----------+------+
```
+4. 当存在sequence列时,将与导入数据key 相同的数据全部删除
+
+```bash
+curl --location-trusted -u root: -H "column_separator:," -H "columns: name,
gender, age" -H "function_column.sequence_col: age" -H "merge_type: DELETE" -T
~/table1_data http://127.0.0.1:8130/api/test/table1/_stream_load
+```
+
+当unique表设置了sequence列时,在相同key列下,sequence列的值会作为REPLACE聚合函数替换顺序的依据,较大值可以替换较小值。
+当对这种表基于`__DORIS_DELETE_SIGN__`进行删除标记时,需要保证key相同和sequence列值要大于等于当前值。
+
+假设有表,结构如下
+```sql
+mysql> SET show_hidden_columns=true;
+Query OK, 0 rows affected (0.00 sec)
+
+mysql> DESC table1;
++------------------------+--------------+------+-------+---------+---------+
+| Field | Type | Null | Key | Default | Extra |
++------------------------+--------------+------+-------+---------+---------+
+| name | VARCHAR(100) | No | true | NULL | |
+| gender | VARCHAR(10) | Yes | false | NULL | REPLACE |
+| age | INT | Yes | false | NULL | REPLACE |
+| __DORIS_DELETE_SIGN__ | TINYINT | No | false | 0 | REPLACE |
+| __DORIS_SEQUENCE_COL__ | INT | Yes | false | NULL | REPLACE |
++------------------------+--------------+------+-------+---------+---------+
+4 rows in set (0.00 sec)
+```
+
+假设导入表中原有数据为:
+
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| li | male | 10 |
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+
+当导入数据为:
+```text
+li,male,10
+```
+
+导入后数据后会变成:
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+会发现数据
+```text
+li,male,10
+```
+被删除成功。
+
+但是假如导入数据为:
+```text
+li,male,9
+```
+
+导入后数据会变成:
+```text
++-------+--------+------+
+| name | gender | age |
++-------+--------+------+
+| li | male | 10 |
+| wang | male | 14 |
+| zhang | male | 12 |
++-------+--------+------+
+```
+
+会看到数据
+```text
+li,male,10
+```
+并没有被删除,这是因为在底层的依赖关系上,会先判断key相同的情况,对外展示sequence列的值大的行数据,然后在看该行的`__DORIS_DELETE_SIGN__`值是否为1,如果为1则不会对外展示,如果为0,则仍会读出来。
+
+**当导入数据中同时存在数据写入和删除时(例如Flink
CDC场景中),使用seq列可以有效的保证当数据乱序到达时的一致性,避免后到达的一个旧版本的删除操作,误删掉了先到达的新版本的数据。**
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]