Copilot commented on code in PR #3217: URL: https://github.com/apache/doris-website/pull/3217#discussion_r2644417841
########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md: ########## @@ -1028,5 +1028,5 @@ Doris 可以在导入语句中支持非常丰富的列转换和过滤操作。 ### 导入时进行部分列更新/灵活部分列更新 -关于导入时,如何表达部分列更新,可以参考 [数据更新/主键模型的导入更新](../../../data-operate/update/update-of-unique-model) 文档 +关于导入时,如何表达部分列更新,可以参考 [列更新](../../../data-operate/update/partial-column-update.md) 文档 Review Comment: This reference points to `../../../data-operate/update/partial-column-update.md` but that file does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. This will create a broken link. ```suggestion 关于导入时,如何表达部分列更新,可以参考列更新相关文档。 ``` ########## versioned_docs/version-3.x/data-operate/update/update-of-aggregate-model.md: ########## @@ -12,52 +12,6 @@ This document primarily introduces how to update the Doris Aggregate model based When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be generated at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial Column Update of Aggregate Model +## Partial Column Update -The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. - -**Create Table** - -Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. - -```sql -CREATE TABLE order_tbl ( - order_id int(11) NULL, - order_amount int(11) REPLACE_IF_NOT_NULL NULL, - order_status varchar(100) REPLACE_IF_NOT_NULL NULL -) ENGINE=OLAP -AGGREGATE KEY(order_id) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(order_id) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -**Data Insertion** - -Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. - -**Example** - -Similar to the previous examples, the corresponding Stream Load command is (no additional header required): - -```shell -$ cat update.csv - -1,To be shipped - -curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T ./update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -The corresponding `INSERT INTO` statement is (no additional session variable settings required): - -```sql -INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); -``` - -## Notes on Partial Column Updates - -The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. - -Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. +For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-aggregate-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-aggregate-key-model` but the file `partial-column-update.md` does not exist in the `versioned_docs/version-3.x/data-operate/update/` directory. This will create a broken link. ```suggestion For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to Partial Column Update. ``` ########## versioned_sidebars/version-4.x-sidebars.json: ########## @@ -241,9 +241,10 @@ "label": "Updating Data", "items": [ "data-operate/update/update-overview", - "data-operate/update/unique-update", + "data-operate/update/unique-update-sql", "data-operate/update/update-of-unique-model", "data-operate/update/update-of-aggregate-model", + "data-operate/update/partial-column-update", Review Comment: The sidebar references `"data-operate/update/partial-column-update"` but this file does not exist in `versioned_docs/version-4.x/data-operate/update/`. This new file needs to be created for the 4.x version documentation. ```suggestion ``` ########## versioned_sidebars/version-4.x-sidebars.json: ########## @@ -241,9 +241,10 @@ "label": "Updating Data", "items": [ "data-operate/update/update-overview", - "data-operate/update/unique-update", + "data-operate/update/unique-update-sql", Review Comment: The sidebar references `"data-operate/update/unique-update-sql"` but this file does not exist in `versioned_docs/version-4.x/data-operate/update/`. The PR description indicates this is a file rename from `unique-update.md`, but the renamed file is missing from this version directory. ```suggestion "data-operate/update/unique-update", ``` ########## docs/data-operate/import/load-manual.md: ########## @@ -39,6 +39,10 @@ Apache Doris offers various methods for importing and integrating data, allowing Each import method in Doris is an implicit transaction by default. For more information on transactions, refer to [Transactions](../transaction.md). +### Partial Column Updates During Import + +Doris supports partial column updates during data import, allowing you to update only specific columns in a table without providing values for all columns. This is particularly useful for updating wide tables or performing incremental updates. For detailed information on how to perform partial column updates for Unique Key Model and Aggregate Key Model tables, please refer to [Partial Column Update](../update/partial-column-update.md). Review Comment: This reference points to `../update/partial-column-update.md` but that file does not exist in the `docs/data-operate/update/` directory. This will create a broken link. ```suggestion Doris supports partial column updates during data import, allowing you to update only specific columns in a table without providing values for all columns. This is particularly useful for updating wide tables or performing incremental updates. For detailed information on how to perform partial column updates for Unique Key Model and Aggregate Key Model tables, please refer to Partial Column Update. ``` ########## sidebars.ts: ########## @@ -236,9 +236,10 @@ const sidebars: SidebarsConfig = { label: 'Updating Data', items: [ 'data-operate/update/update-overview', - 'data-operate/update/unique-update', + 'data-operate/update/unique-update-sql', 'data-operate/update/update-of-unique-model', 'data-operate/update/update-of-aggregate-model', + 'data-operate/update/partial-column-update', Review Comment: The sidebar configuration references `'data-operate/update/partial-column-update'` but this file does not exist in the `docs/data-operate/update/` directory. This will cause a broken link in the documentation navigation. The referenced file `partial-column-update.md` needs to be created or added to this pull request. ```suggestion ``` ########## docs/data-operate/update/update-of-unique-model.md: ########## @@ -14,451 +14,4 @@ When loading data into the unique key model using Doris-supported methods like S ## Partial Column Update -Partial column update allows you to update specific fields in a table without modifying all fields. You can use the Update statement to perform this operation, which typically involves reading the entire row, updating the desired fields, and writing it back. This read-write transaction is time-consuming and not suitable for large-scale data writing. Doris provides a feature to directly insert or update partial column data in the unique key model load update, bypassing the need to read the entire row first, thus significantly improving update efficiency. - -:::caution Note - -1. Version 2.0 only supports partial column updates in the Merge-on-Write implementation of the Unique Key. -2. Starting from version 2.0.2, partial column updates are supported using INSERT INTO. -3. Partial column updates are not supported on tables with synchronized materialized views. -4. Partial column updates are not allowed on tables doing schema change. -::: - -### Applicable Scenarios - -- Real-time dynamic column updates, requiring frequent updates of specific fields in the table. For example, updating fields related to the latest user behavior in a user tag table for real-time analysis and decision-making in advertising/recommendation systems. -- Merging multiple source tables into one large wide table. -- Data correction. - -### Usage Example - -Assume there is an order table `order_tbl` in Doris, where the order id is the Key column, and the order status and order amount are the Value columns. The data status is as follows: - -| Order id | Order Amount | Order Status | -| -------- | -------------| -------------| -| 1 | 100 | Pending Payment | - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -After the user clicks to pay, the Doris system needs to change the order status of the order with order id '1' to 'Pending Shipment'. - -#### Partial Column Update Using Load Methods - -**StreamLoad/BrokerLoad/RoutineLoad** - -Prepare the following CSV file: - -``` -1,Pending Shipment -``` - -Add the following header during load: - -```sql -partial_columns:true -``` - -Specify the columns to be loaded in `columns` (must include all key columns). Below is an example of Stream Load: - -```sql -curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -**INSERT INTO** - -In all data models, the default behavior of `INSERT INTO` when given partial columns is to write the entire row. To prevent misuse, in the Merge-on-Write implementation, `INSERT INTO` maintains the semantics of whole row UPSERT by default. To enable partial column updates, set the following session variable: - -```sql -SET enable_unique_key_partial_update=true; -INSERT INTO order_tbl (order_id, order_status) VALUES (1, 'Pending Shipment'); -``` - -Note that the session variable `enable_insert_strict` defaults to true, enabling strict mode by default. In strict mode, partial column updates do not allow updating non-existent keys. To insert non-existent keys using the insert statement for partial column updates, set `enable_unique_key_partial_update` to true and `enable_insert_strict` to false. - -**Flink Connector** - -If using Flink Connector, add the following configuration: - -```sql -'sink.properties.partial_columns' = 'true', -``` - -Specify the columns to be loaded in `sink.properties.column` (must include all key columns). - -#### Update Result - -The result after the update is as follows: - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Shipment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -### Usage Notes - -Since the Merge-on-Write implementation needs to complete the entire row of data during writing to ensure optimal query performance, using it for partial column updates may decrease partial load performance. - -Performance optimization suggestions: - -- Use SSDs equipped with NVMe or high-speed SSD cloud disks, as completing data will read a large amount of historical data, generating high read IOPS and throughput. -- Enabling row storage can reduce the IOPS generated when completing data, significantly improving load performance. Enable row storage by setting the following property when creating a table: - -```Plain -"store_row_column" = "true" -``` - -Currently, all rows in the same batch data writing task (whether a load task or `INSERT INTO`) can only update the same columns. To update data with different columns, write in different batches. - -## Flexible Partial Column Updates - -Previously, Doris's partial update feature required that every row in an import update the same columns. Now, Doris supports a more flexible partial update method that allows each row in a single import to update different columns(supported since 3.1.0). - -:::caution Note: - -1. Currently, only the Stream Load import method and tools using Stream Load (e.g. Doris-Flink-Connector) support this feature. -2. The import file must be in JSON format when using flexible column updates. -::: - -### Applicable Scenarios - -When using CDC to synchronize data from a database system to Doris in real-time, the records output by the source system may not contain complete row data, but only the values of the primary keys and the updated columns. In such cases, the columns updated in a batch of data within a time window may differ. Flexible column updates can be used to import data into Doris. - -### Usage - -**Enabling Flexible Column Updates for Existing Tables** - -For existing Merge-On-Write tables created in old versions of Doris, after upgrading, you can enable flexible partial updates using the command: `ALTER TABLE db1.tbl1 ENABLE FEATURE "UPDATE_FLEXIBLE_COLUMNS";`. After executing this command, if the result of `show create table db1.tbl1` includes `"enable_unique_key_skip_bitmap_column" = "true"`, the feature has been successfully enabled. Ensure that the target table has the light-schema-change feature enabled beforehand. - -**Using Flexible Column Updates for New Tables** - -For new tables, to use the flexible column update feature, specify the following table properties when creating the table to enable Merge-on-Write and include the required hidden bitmap column for flexible column updates: - -```Plain -"enable_unique_key_merge_on_write" = "true" -"enable_unique_key_skip_bitmap_column" = "true" -``` - -**StreamLoad** - -When using Stream Load, add the following header: - -```Plain -unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS -``` - -**Flink Doris Connector** - -If using the Flink Doris Connector, add the following configuration: - -```Plain -'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS' -``` - -### Example - -Assuming the following table: - -```sql -CREATE TABLE t1 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true"); -``` - -The original data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+----+----+----+----+----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+----+----+----+----+----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 1 | -| 2 | 2 | 2 | 2 | 2 | 2 | -| 3 | 3 | 3 | 3 | 3 | 3 | -| 4 | 4 | 4 | 4 | 4 | 4 | -| 5 | 5 | 5 | 5 | 5 | 5 | -+---+----+----+----+----+----+ -``` - -Now, updating some fields using flexible column updates: - -```shell -$ cat test1.json -``` -```json -{"k": 0, "__DORIS_DELETE_SIGN__": 1} -{"k": 1, "v1": 10} -{"k": 2, "v2": 20, "v5": 25} -{"k": 3, "v3": 30} -{"k": 4, "v4": 20, "v1": 43, "v3": 99} -{"k": 5, "v5": null} -{"k": 6, "v1": 999, "v3": 777} -{"k": 2, "v4": 222} -{"k": 1, "v2": 111, "v3": 111} -``` -```shell -curl --location-trusted -u root: \ --H "strict_mode:false" \ --H "format:json" \ --H "read_json_by_line:true" \ --H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \ --T test1.json \ --XPUT http://<host>:<http_port>/api/d1/t1/_stream_load -``` - -After the update, the data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+-----+------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+-----+------+-----+------+--------+ -| 1 | 10 | 111 | 111 | 1 | 1 | -| 2 | 2 | 20 | 2 | 222 | 25 | -| 3 | 3 | 3 | 30 | 3 | 3 | -| 4 | 43 | 4 | 99 | 20 | 4 | -| 5 | 5 | 5 | 5 | 5 | <null> | -| 6 | 999 | 9876 | 777 | 1234 | <null> | -+---+-----+------+-----+------+--------+ -``` - -### Limitations and Considerations - -1. Similar to previous partial updates, flexible column updates require that each row of imported data include all key columns. Rows not meeting this requirement will be filtered out and counted in filter rows. If the number of filtered rows exceeds the `max_filter_ratio` threshold for this import, the entire import will fail, and filtered data will generate an error log. - -2. In flexible partial update loads, key-value pairs in each JSON object are only valid if the key matches a column name in the target table. Key-value pairs that do not meet this requirement will be ignored. Pairs with keys `__DORIS_VERSION_COL__`, `__DORIS_ROW_STORE_COL__`, or `__DORIS_SKIP_BITMAP_COL__` will also be ignored. - -3. If the table properties of the target table include `function_column.sequence_type`, the import can specify the value for the `__DORIS_SEQUENCE_COL__` column by including a key-value pair in the JSON object with key `__DORIS_SEQUENCE_COL__`. For rows that do not specify a value for the `__DORIS_SEQUENCE_COL__`, if the key exists in the original table, the value will be filled from the old row; otherwise, it will be set to null. - -For example, for the following table: - -```sql -CREATE TABLE t2 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_type" = "int"); -``` - -The original data in the table is: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Importing data using flexible partial column updates: - -```json -{"k": 1, "v1": 111, "v5": 9, "__DORIS_SEQUENCE_COL__": 9} -{"k": 2, "v2": 222, "v5": 25, "__DORIS_SEQUENCE_COL__": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null, "__DORIS_SEQUENCE_COL__": 50} -{"k": 5, "v5": null, "__DORIS_SEQUENCE_COL__": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300, "__DORIS_SEQUENCE_COL__": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+--------+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | <null> | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+--------+ -``` - -4. When the table property `function_column.sequence_col` is set in the target table, the key-value pair in the JSON object for flexible column updates with the key `__DORIS_SEQUENCE_COL__` will be ignored. The value of the `__DORIS_SEQUENCE_COL__` column in a specific row during import will be the same of the final value of the column specified by the table property `function_column.sequence_col` for that row. - -For example, for the following table: - -```sql -CREATE TABLE t3 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL DEFAULT "31" -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_col" = "v5"); -``` - -The original data in the table is as follows: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Using flexible partial updates, the following data is imported: - -```json -{"k": 1, "v1": 111, "v5": 9} -{"k": 2, "v2": 222, "v5": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null} -{"k": 5, "v5": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+-----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+-----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | 31 | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+-----+ -``` - -5. When using flexible partial updates, the following import parameters cannot be specified or enabled: - - The `merge_type` parameter cannot be specified. - - The `delete` parameter cannot be specified. - - The `fuzzy_parse` parameter cannot be enabled. - - The `columns` parameter cannot be specified. - - The `jsonpaths` parameter cannot be specified. - - The `hidden_columns` parameter cannot be specified. - - The `function_column.sequence_col` parameter cannot be specified. - - The `sql` parameter cannot be specified. - - The `memtable_on_sink_node` option cannot be enabled. - - The `group_commit` parameter cannot be specified. - - The `where` parameter cannot be specified. - -6. Flexible partial updates are not supported on tables with Variant columns. - -7. Flexible partial updates are not supported on tables with synchronous materialized views. - - -## Handling New Rows in Partial Column Updates - -The session variable or import property `partial_update_new_key_behavior` controls the behavior when inserting new rows during partial column updates. - -When `partial_update_new_key_behavior=ERROR`, each inserted row must have a key that already exists in the table. When `partial_update_new_key_behavior=APPEND`, partial column updates can update existing rows with matching keys or insert new rows with keys that do not exist in the table. - -For example, consider the following table structure: -```sql -CREATE TABLE user_profile -( - id INT, - name VARCHAR(10), - age INT, - city VARCHAR(10), - balance DECIMAL(9, 0), - last_access_time DATETIME -) ENGINE=OLAP -UNIQUE KEY(id) -DISTRIBUTED BY HASH(id) BUCKETS 1 -PROPERTIES ( - "enable_unique_key_merge_on_write" = "true" -); -``` - -Suppose the table contains the following data: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00| -+------+-------+------+----------+---------+---------------------+ -``` - -If you use `Insert Into` for partial column updates with `partial_update_new_key_behavior=ERROR`, and try to insert the following data, the operation will fail because the keys `(3)` and `(18)` do not exist in the original table: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=ERROR; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error: [E-7003]Can't append new rows in partial update when partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table., host: 127.0.0.1") -``` - -If you use `partial_update_new_key_behavior=APPEND` and perform the same partial column update: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=APPEND; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -``` - -The existing row will be updated, and two new rows will be inserted. For columns not specified in the inserted data, if a default value is defined, the default will be used; if the column is nullable, NULL will be used; otherwise, the insert will fail. - -The query result will be: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 | -| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 | -| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 | -+------+-------+------+----------+---------+---------------------+ -``` +For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-unique-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-unique-key-model` but the file `partial-column-update.md` does not exist in the `docs/data-operate/update/` directory. This will create a broken link. ```suggestion For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to the Partial Column Update documentation. ``` ########## versioned_docs/version-4.x/data-operate/update/update-of-aggregate-model.md: ########## @@ -12,52 +12,6 @@ This document primarily introduces how to update the Doris Aggregate model based When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be generated at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial Column Update of Aggregate Model +## Partial Column Update -The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. - -**Create Table** - -Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. - -```sql -CREATE TABLE order_tbl ( - order_id int(11) NULL, - order_amount int(11) REPLACE_IF_NOT_NULL NULL, - order_status varchar(100) REPLACE_IF_NOT_NULL NULL -) ENGINE=OLAP -AGGREGATE KEY(order_id) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(order_id) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -**Data Insertion** - -Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. - -**Example** - -Similar to the previous examples, the corresponding Stream Load command is (no additional header required): - -```shell -$ cat update.csv - -1,To be shipped - -curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T ./update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -The corresponding `INSERT INTO` statement is (no additional session variable settings required): - -```sql -INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); -``` - -## Notes on Partial Column Updates - -The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. - -Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. +For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-aggregate-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-aggregate-key-model` but the file `partial-column-update.md` does not exist in the `versioned_docs/version-4.x/data-operate/update/` directory. This will create a broken link. ```suggestion For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to the Partial Column Update documentation. ``` ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/load-manual.md: ########## @@ -36,6 +36,10 @@ Apache Doris 提供了多种导入和集成数据的方法,您可以使用合 Doris 的每个导入默认都是一个隐式事务,事务相关的更多信息请参考[事务](../transaction.md)。 +### 导入时进行部分列更新 + +Doris 支持在数据导入时进行部分列更新,允许您只更新表中的特定列,而不需要提供所有列的值。这对于更新宽表或执行增量更新特别有用。关于如何对主键模型和聚合模型表进行部分列更新的详细信息,请参考[列更新](../update/partial-column-update.md)。 Review Comment: This reference points to `../update/partial-column-update.md` but that file does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. This will create a broken link. ```suggestion Doris 支持在数据导入时进行部分列更新,允许您只更新表中的特定列,而不需要提供所有列的值。这对于更新宽表或执行增量更新特别有用。关于如何对主键模型和聚合模型表进行部分列更新的详细信息,请参考[列更新](../update/)。 ``` ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-aggregate-model.md: ########## @@ -12,52 +12,6 @@ 使用 Doris 支持的 Stream Load,Broker Load,Routine Load,Insert Into 等导入方式,往聚合模型(Agg 模型)中进行数据导入时,都会将新的值与旧的聚合值,根据列的聚合函数产出新的聚合值,这个值可能是插入时产出,也可能是异步 Compaction 时产出,但是用户查询时,都会得到一样的返回值。 -## 聚合模型的部分列更新 +## 部分列更新 -Aggregate 表主要在预聚合场景使用而非数据更新的场景使用,但也可以通过将聚合函数设置为 REPLACE_IF_NOT_NULL 来实现部分列更新效果。 - -**建表** - -将需要进行列更新的字段对应的聚合函数设置为`REPLACE_IF_NOT_NULL` - -```sql -CREATE TABLE order_tbl ( - order_id int(11) NULL, - order_amount int(11) REPLACE_IF_NOT_NULL NULL, - order_status varchar(100) REPLACE_IF_NOT_NULL NULL -) ENGINE=OLAP -AGGREGATE KEY(order_id) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(order_id) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -**数据写入** - -无论是 Stream Load、Broker Load、Routine Load 还是`INSERT INTO`, 直接写入要更新的字段的数据即可 - -**示例** - -与前面例子相同,对应的 Stream Load 命令为(不需要额外的 header): - -```shell -$ cat update.csv - -1,To be shipped - -curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -对应的`INSERT INTO`语句为(不需要额外设置 session variable): - -```sql -INSERT INTO order_tbl (order_id, order_status) values (1,'待发货'); -``` - -## 部分列更新使用注意 - -Aggregate Key 模型在写入过程中不做任何额外处理,所以写入性能不受影响,与普通的数据导入相同。但是在查询时进行聚合的代价较大,典型的聚合查询性能相比 Unique Key 模型的 Merge-on-Write 实现会有 5-10 倍的下降。 - -由于 `REPLACE_IF_NOT_NULL` 聚合函数仅在非 NULL 值时才会生效,因此用户无法将某个字段值修改为 NULL 值。 +关于聚合模型(Aggregate Key Model)表的列更新详细说明,包括建表、数据写入示例和使用注意事项等内容,请参考[列更新](./partial-column-update.md#聚合模型的列更新)文档。 Review Comment: This reference points to `./partial-column-update.md#聚合模型的列更新` but the file `partial-column-update.md` does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. This will create a broken link. The Chinese version of the file needs to be created. ```suggestion 关于聚合模型(Aggregate Key Model)表的列更新详细说明,包括建表、数据写入示例和使用注意事项等内容,请参考“列更新”文档。 ``` ########## versioned_sidebars/version-3.x-sidebars.json: ########## @@ -225,9 +225,10 @@ "label": "Updating Data", "items": [ "data-operate/update/update-overview", - "data-operate/update/unique-update", + "data-operate/update/unique-update-sql", "data-operate/update/update-of-unique-model", "data-operate/update/update-of-aggregate-model", + "data-operate/update/partial-column-update", Review Comment: The sidebar references `"data-operate/update/partial-column-update"` but this file does not exist in `versioned_docs/version-3.x/data-operate/update/`. This new file needs to be created for the 3.x version documentation. ```suggestion ``` ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/stream-load-manual.md: ########## @@ -334,7 +334,7 @@ Stream Load 操作支持 HTTP 分块导入(HTTP chunked)与 HTTP 非分块 | enclose | 指定包围符。当 CSV 数据字段中含有行分隔符或列分隔符时,为防止意外截断,可指定单字节字符作为包围符起到保护作用。例如列分隔符为 ",",包围符为 "'",数据为 "a,'b,c'",则 "b,c" 会被解析为一个字段。注意:当 enclose 设置为`"`时,trim_double_quotes 一定要设置为 true。 | | escape | 指定转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为 "a,'b,'c'",包围符为 "'",希望 "b,'c 被作为一个字段解析,则需要指定单字节转义符,例如"\",将数据修改为 "a,'b,\'c'"。 | | memtable_on_sink_node | 导入数据的时候是否开启 MemTable 前移,默认为 false。 | -| unique_key_update_mode | Unique 表上的更新模式,目前仅对 Merge-On-Write Unique 表有效,一共支持三种类型 `UPSERT`, `UPDATE_FIXED_COLUMNS`, `UPDATE_FLEXIBLE_COLUMNS`。 `UPSERT`: 表示以 upsert 语义导入数据; `UPDATE_FIXED_COLUMNS`: 表示以[部分列更新](../../../data-operate/update/update-of-unique-model)的方式导入数据; `UPDATE_FLEXIBLE_COLUMNS`: 表示以[灵活部分列更新](../../../data-operate/update/update-of-unique-model)的方式导入数据| +| unique_key_update_mode | Unique 表上的更新模式,目前仅对 Merge-On-Write Unique 表有效,一共支持三种类型 `UPSERT`, `UPDATE_FIXED_COLUMNS`, `UPDATE_FLEXIBLE_COLUMNS`。 `UPSERT`: 表示以 upsert 语义导入数据; `UPDATE_FIXED_COLUMNS`: 表示以[部分列更新](../../../data-operate/update/partial-column-update.md#主键模型的列更新)的方式导入数据; `UPDATE_FLEXIBLE_COLUMNS`: 表示以[灵活部分列更新](../../../data-operate/update/partial-column-update.md#灵活部分列更新)的方式导入数据| Review Comment: This file references `partial-column-update.md` in two places but that file does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. The links on this line will be broken. The Chinese version of `partial-column-update.md` needs to be created with appropriate anchors. ########## versioned_docs/version-4.x/data-operate/update/update-of-unique-model.md: ########## @@ -14,451 +14,4 @@ When loading data into the unique key model using Doris-supported methods like S ## Partial Column Update -Partial column update allows you to update specific fields in a table without modifying all fields. You can use the Update statement to perform this operation, which typically involves reading the entire row, updating the desired fields, and writing it back. This read-write transaction is time-consuming and not suitable for large-scale data writing. Doris provides a feature to directly insert or update partial column data in the unique key model load update, bypassing the need to read the entire row first, thus significantly improving update efficiency. - -:::caution Note - -1. Version 2.0 only supports partial column updates in the Merge-on-Write implementation of the Unique Key. -2. Starting from version 2.0.2, partial column updates are supported using INSERT INTO. -3. Partial column updates are not supported on tables with synchronized materialized views. -4. Partial column updates are not allowed on tables doing schema change. -::: - -### Applicable Scenarios - -- Real-time dynamic column updates, requiring frequent updates of specific fields in the table. For example, updating fields related to the latest user behavior in a user tag table for real-time analysis and decision-making in advertising/recommendation systems. -- Merging multiple source tables into one large wide table. -- Data correction. - -### Usage Example - -Assume there is an order table `order_tbl` in Doris, where the order id is the Key column, and the order status and order amount are the Value columns. The data status is as follows: - -| Order id | Order Amount | Order Status | -| -------- | -------------| -------------| -| 1 | 100 | Pending Payment | - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -After the user clicks to pay, the Doris system needs to change the order status of the order with order id '1' to 'Pending Shipment'. - -#### Partial Column Update Using Load Methods - -**StreamLoad/BrokerLoad/RoutineLoad** - -Prepare the following CSV file: - -``` -1,Pending Shipment -``` - -Add the following header during load: - -```sql -partial_columns:true -``` - -Specify the columns to be loaded in `columns` (must include all key columns). Below is an example of Stream Load: - -```sql -curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -**INSERT INTO** - -In all data models, the default behavior of `INSERT INTO` when given partial columns is to write the entire row. To prevent misuse, in the Merge-on-Write implementation, `INSERT INTO` maintains the semantics of whole row UPSERT by default. To enable partial column updates, set the following session variable: - -```sql -SET enable_unique_key_partial_update=true; -INSERT INTO order_tbl (order_id, order_status) VALUES (1, 'Pending Shipment'); -``` - -Note that the session variable `enable_insert_strict` defaults to true, enabling strict mode by default. In strict mode, partial column updates do not allow updating non-existent keys. To insert non-existent keys using the insert statement for partial column updates, set `enable_unique_key_partial_update` to true and `enable_insert_strict` to false. - -**Flink Connector** - -If using Flink Connector, add the following configuration: - -```sql -'sink.properties.partial_columns' = 'true', -``` - -Specify the columns to be loaded in `sink.properties.column` (must include all key columns). - -#### Update Result - -The result after the update is as follows: - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Shipment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -### Usage Notes - -Since the Merge-on-Write implementation needs to complete the entire row of data during writing to ensure optimal query performance, using it for partial column updates may decrease partial load performance. - -Performance optimization suggestions: - -- Use SSDs equipped with NVMe or high-speed SSD cloud disks, as completing data will read a large amount of historical data, generating high read IOPS and throughput. -- Enabling row storage can reduce the IOPS generated when completing data, significantly improving load performance. Enable row storage by setting the following property when creating a table: - -```Plain -"store_row_column" = "true" -``` - -Currently, all rows in the same batch data writing task (whether a load task or `INSERT INTO`) can only update the same columns. To update data with different columns, write in different batches. - -## Flexible Partial Column Updates - -Previously, Doris's partial update feature required that every row in an import update the same columns. Now, Doris supports a more flexible partial update method that allows each row in a single import to update different columns(supported since 3.1.0). - -:::caution Note: - -1. Currently, only the Stream Load import method and tools using Stream Load (e.g. Doris-Flink-Connector) support this feature. -2. The import file must be in JSON format when using flexible column updates. -::: - -### Applicable Scenarios - -When using CDC to synchronize data from a database system to Doris in real-time, the records output by the source system may not contain complete row data, but only the values of the primary keys and the updated columns. In such cases, the columns updated in a batch of data within a time window may differ. Flexible column updates can be used to import data into Doris. - -### Usage - -**Enabling Flexible Column Updates for Existing Tables** - -For existing Merge-On-Write tables created in old versions of Doris, after upgrading, you can enable flexible partial updates using the command: `ALTER TABLE db1.tbl1 ENABLE FEATURE "UPDATE_FLEXIBLE_COLUMNS";`. After executing this command, if the result of `show create table db1.tbl1` includes `"enable_unique_key_skip_bitmap_column" = "true"`, the feature has been successfully enabled. Ensure that the target table has the light-schema-change feature enabled beforehand. - -**Using Flexible Column Updates for New Tables** - -For new tables, to use the flexible column update feature, specify the following table properties when creating the table to enable Merge-on-Write and include the required hidden bitmap column for flexible column updates: - -```Plain -"enable_unique_key_merge_on_write" = "true" -"enable_unique_key_skip_bitmap_column" = "true" -``` - -**StreamLoad** - -When using Stream Load, add the following header: - -```Plain -unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS -``` - -**Flink Doris Connector** - -If using the Flink Doris Connector, add the following configuration: - -```Plain -'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS' -``` - -### Example - -Assuming the following table: - -```sql -CREATE TABLE t1 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true"); -``` - -The original data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+----+----+----+----+----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+----+----+----+----+----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 1 | -| 2 | 2 | 2 | 2 | 2 | 2 | -| 3 | 3 | 3 | 3 | 3 | 3 | -| 4 | 4 | 4 | 4 | 4 | 4 | -| 5 | 5 | 5 | 5 | 5 | 5 | -+---+----+----+----+----+----+ -``` - -Now, updating some fields using flexible column updates: - -```shell -$ cat test1.json -``` -```json -{"k": 0, "__DORIS_DELETE_SIGN__": 1} -{"k": 1, "v1": 10} -{"k": 2, "v2": 20, "v5": 25} -{"k": 3, "v3": 30} -{"k": 4, "v4": 20, "v1": 43, "v3": 99} -{"k": 5, "v5": null} -{"k": 6, "v1": 999, "v3": 777} -{"k": 2, "v4": 222} -{"k": 1, "v2": 111, "v3": 111} -``` -```shell -curl --location-trusted -u root: \ --H "strict_mode:false" \ --H "format:json" \ --H "read_json_by_line:true" \ --H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \ --T test1.json \ --XPUT http://<host>:<http_port>/api/d1/t1/_stream_load -``` - -After the update, the data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+-----+------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+-----+------+-----+------+--------+ -| 1 | 10 | 111 | 111 | 1 | 1 | -| 2 | 2 | 20 | 2 | 222 | 25 | -| 3 | 3 | 3 | 30 | 3 | 3 | -| 4 | 43 | 4 | 99 | 20 | 4 | -| 5 | 5 | 5 | 5 | 5 | <null> | -| 6 | 999 | 9876 | 777 | 1234 | <null> | -+---+-----+------+-----+------+--------+ -``` - -### Limitations and Considerations - -1. Similar to previous partial updates, flexible column updates require that each row of imported data include all key columns. Rows not meeting this requirement will be filtered out and counted in filter rows. If the number of filtered rows exceeds the `max_filter_ratio` threshold for this import, the entire import will fail, and filtered data will generate an error log. - -2. In flexible partial update loads, key-value pairs in each JSON object are only valid if the key matches a column name in the target table. Key-value pairs that do not meet this requirement will be ignored. Pairs with keys `__DORIS_VERSION_COL__`, `__DORIS_ROW_STORE_COL__`, or `__DORIS_SKIP_BITMAP_COL__` will also be ignored. - -3. If the table properties of the target table include `function_column.sequence_type`, the import can specify the value for the `__DORIS_SEQUENCE_COL__` column by including a key-value pair in the JSON object with key `__DORIS_SEQUENCE_COL__`. For rows that do not specify a value for the `__DORIS_SEQUENCE_COL__`, if the key exists in the original table, the value will be filled from the old row; otherwise, it will be set to null. - -For example, for the following table: - -```sql -CREATE TABLE t2 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_type" = "int"); -``` - -The original data in the table is: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Importing data using flexible partial column updates: - -```json -{"k": 1, "v1": 111, "v5": 9, "__DORIS_SEQUENCE_COL__": 9} -{"k": 2, "v2": 222, "v5": 25, "__DORIS_SEQUENCE_COL__": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null, "__DORIS_SEQUENCE_COL__": 50} -{"k": 5, "v5": null, "__DORIS_SEQUENCE_COL__": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300, "__DORIS_SEQUENCE_COL__": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+--------+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | <null> | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+--------+ -``` - -4. When the table property `function_column.sequence_col` is set in the target table, the key-value pair in the JSON object for flexible column updates with the key `__DORIS_SEQUENCE_COL__` will be ignored. The value of the `__DORIS_SEQUENCE_COL__` column in a specific row during import will be the same of the final value of the column specified by the table property `function_column.sequence_col` for that row. - -For example, for the following table: - -```sql -CREATE TABLE t3 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL DEFAULT "31" -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_col" = "v5"); -``` - -The original data in the table is as follows: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Using flexible partial updates, the following data is imported: - -```json -{"k": 1, "v1": 111, "v5": 9} -{"k": 2, "v2": 222, "v5": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null} -{"k": 5, "v5": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+-----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+-----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | 31 | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+-----+ -``` - -5. When using flexible partial updates, the following import parameters cannot be specified or enabled: - - The `merge_type` parameter cannot be specified. - - The `delete` parameter cannot be specified. - - The `fuzzy_parse` parameter cannot be enabled. - - The `columns` parameter cannot be specified. - - The `jsonpaths` parameter cannot be specified. - - The `hidden_columns` parameter cannot be specified. - - The `function_column.sequence_col` parameter cannot be specified. - - The `sql` parameter cannot be specified. - - The `memtable_on_sink_node` option cannot be enabled. - - The `group_commit` parameter cannot be specified. - - The `where` parameter cannot be specified. - -6. Flexible partial updates are not supported on tables with Variant columns. - -7. Flexible partial updates are not supported on tables with synchronous materialized views. - - -## Handling New Rows in Partial Column Updates - -The session variable or import property `partial_update_new_key_behavior` controls the behavior when inserting new rows during partial column updates. - -When `partial_update_new_key_behavior=ERROR`, each inserted row must have a key that already exists in the table. When `partial_update_new_key_behavior=APPEND`, partial column updates can update existing rows with matching keys or insert new rows with keys that do not exist in the table. - -For example, consider the following table structure: -```sql -CREATE TABLE user_profile -( - id INT, - name VARCHAR(10), - age INT, - city VARCHAR(10), - balance DECIMAL(9, 0), - last_access_time DATETIME -) ENGINE=OLAP -UNIQUE KEY(id) -DISTRIBUTED BY HASH(id) BUCKETS 1 -PROPERTIES ( - "enable_unique_key_merge_on_write" = "true" -); -``` - -Suppose the table contains the following data: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00| -+------+-------+------+----------+---------+---------------------+ -``` - -If you use `Insert Into` for partial column updates with `partial_update_new_key_behavior=ERROR`, and try to insert the following data, the operation will fail because the keys `(3)` and `(18)` do not exist in the original table: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=ERROR; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error: [E-7003]Can't append new rows in partial update when partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table., host: 127.0.0.1") -``` - -If you use `partial_update_new_key_behavior=APPEND` and perform the same partial column update: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=APPEND; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -``` - -The existing row will be updated, and two new rows will be inserted. For columns not specified in the inserted data, if a default value is defined, the default will be used; if the column is nullable, NULL will be used; otherwise, the insert will fail. - -The query result will be: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 | -| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 | -| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 | -+------+-------+------+----------+---------+---------------------+ -``` +For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-unique-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-unique-key-model` but the file `partial-column-update.md` does not exist in the `versioned_docs/version-4.x/data-operate/update/` directory. This will create a broken link. The referenced file with the specified anchor needs to be created. ```suggestion For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to Partial Column Update. ``` ########## versioned_docs/version-3.x/data-operate/import/import-way/stream-load-manual.md: ########## @@ -994,6 +994,6 @@ The strict_mode attribute is used to set whether the import task runs in strict ### Perform partial column updates/flexible partial update during import -For how to express partial column updates during import, please refer to the Data Manipulation/Data Update documentation. +For how to express partial column updates during import, please refer to the [Partial Column Update](../../../data-operate/update/partial-column-update) documentation. Review Comment: This reference points to `../../../data-operate/update/partial-column-update` but the file does not exist in the corresponding directory `versioned_docs/version-3.x/data-operate/update/partial-column-update.md`. This will create a broken link. ```suggestion For how to express partial column updates during import, please refer to the Partial Column Update documentation. ``` ########## docs/data-operate/import/import-way/stream-load-manual.md: ########## @@ -1024,6 +1024,6 @@ The strict_mode attribute is used to set whether the import task runs in strict ### Perform partial column updates/flexible partial update during import -For how to express partial column updates during import, please refer to the Data Manipulation/Data Update documentation. +For how to express partial column updates during import, please refer to the [Partial Column Update](../../../data-operate/update/partial-column-update.md) documentation. Review Comment: This reference points to `../../../data-operate/update/partial-column-update.md` but that file does not exist in the `docs/data-operate/update/` directory. This will create a broken link. ```suggestion For how to express partial column updates during import, please refer to the Partial Column Update documentation. ``` ########## versioned_docs/version-4.x/data-operate/import/import-way/stream-load-manual.md: ########## @@ -1024,6 +1024,6 @@ The strict_mode attribute is used to set whether the import task runs in strict ### Perform partial column updates/flexible partial update during import -For how to express partial column updates during import, please refer to the Data Manipulation/Data Update documentation. +For how to express partial column updates during import, please refer to the [Partial Column Update](../../../data-operate/update/partial-column-update) documentation. Review Comment: This reference points to `../../../data-operate/update/partial-column-update` but the file does not exist in the corresponding directory `versioned_docs/version-4.x/data-operate/update/partial-column-update.md`. This will create a broken link. ```suggestion For how to express partial column updates during import, please refer to the Partial Column Update documentation. ``` ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/import/import-way/routine-load-manual.md: ########## @@ -417,7 +417,7 @@ job_properties 子句具体参数选项如下: | strip_outer_array | 当导入数据格式为 json 时,strip_outer_array 为 true 表示 JSON 数据以数组的形式展现,数据中的每一个元素将被视为一行数据。默认值是 false。通常情况下,Kafka 中的 JSON 数据可能以数组形式表示,即在最外层中包含中括号`[]`,此时,可以指定 `"strip_outer_array" = "true"`,以数组模式消费 Topic 中的数据。如以下数据会被解析成两行:`[{"user_id":1,"name":"Emily","age":25},{"user_id":2,"name":"Benjamin","age":35}]` | | send_batch_parallelism | 用于设置发送批量数据的并行度。如果并行度的值超过 BE 配置中的 `max_send_batch_parallelism_per_job`,那么作为协调点的 BE 将使用 `max_send_batch_parallelism_per_job` 的值。 | | load_to_single_tablet | 支持一个任务只导入数据到对应分区的一个 tablet,默认值为 false,该参数只允许在对带有 random 分桶的 olap 表导数的时候设置。 | -| partial_columns | 指定是否开启部分列更新功能。默认值为 false。该参数只允许在表模型为 Unique 且采用 Merge on Write 时设置。一流多表不支持此参数。具体参考文档[部分列更新](../../../data-operate/update/update-of-unique-model) | +| partial_columns | 指定是否开启部分列更新功能。默认值为 false。该参数只允许在表模型为 Unique 且采用 Merge on Write 时设置。一流多表不支持此参数。具体参考文档[列更新](../../../data-operate/update/partial-column-update.md) | Review Comment: This reference points to `../../../data-operate/update/partial-column-update.md` but that file does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. This will create a broken link. ```suggestion | partial_columns | 指定是否开启部分列更新功能。默认值为 false。该参数只允许在表模型为 Unique 且采用 Merge on Write 时设置。一流多表不支持此参数。具体参考文档列更新。 | ``` ########## versioned_sidebars/version-3.x-sidebars.json: ########## @@ -225,9 +225,10 @@ "label": "Updating Data", "items": [ "data-operate/update/update-overview", - "data-operate/update/unique-update", + "data-operate/update/unique-update-sql", Review Comment: The sidebar references `"data-operate/update/unique-update-sql"` but this file does not exist in `versioned_docs/version-3.x/data-operate/update/`. The renamed file needs to be added to the 3.x version documentation. ```suggestion ``` ########## versioned_docs/version-3.x/data-operate/update/update-of-unique-model.md: ########## @@ -14,455 +14,4 @@ When loading data into the unique key model using Doris-supported methods like S ## Partial Column Update -Partial column update allows you to update specific fields in a table without modifying all fields. You can use the Update statement to perform this operation, which typically involves reading the entire row, updating the desired fields, and writing it back. This read-write transaction is time-consuming and not suitable for large-scale data writing. Doris provides a feature to directly insert or update partial column data in the unique key model load update, bypassing the need to read the entire row first, thus significantly improving update efficiency. - -:::caution Note - -1. Version 2.0 only supports partial column updates in the Merge-on-Write implementation of the Unique Key. -2. Starting from version 2.0.2, partial column updates are supported using INSERT INTO. -3. Partial column updates are not supported on tables with synchronized materialized views. -4. Partial column updates are not allowed on tables doing schema change. -::: - -### Applicable Scenarios - -- Real-time dynamic column updates, requiring frequent updates of specific fields in the table. For example, updating fields related to the latest user behavior in a user tag table for real-time analysis and decision-making in advertising/recommendation systems. -- Merging multiple source tables into one large wide table. -- Data correction. - -### Usage Example - -Assume there is an order table `order_tbl` in Doris, where the order id is the Key column, and the order status and order amount are the Value columns. The data status is as follows: - -| Order id | Order Amount | Order Status | -| -------- | -------------| -------------| -| 1 | 100 | Pending Payment | - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Payment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -After the user clicks to pay, the Doris system needs to change the order status of the order with order id '1' to 'Pending Shipment'. - -#### Partial Column Update Using Load Methods - -**StreamLoad/BrokerLoad/RoutineLoad** - -Prepare the following CSV file: - -``` -1,Pending Shipment -``` - -Add the following header during load: - -```sql -partial_columns:true -``` - -Specify the columns to be loaded in `columns` (must include all key columns). Below is an example of Stream Load: - -```sql -curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -**INSERT INTO** - -In all data models, the default behavior of `INSERT INTO` when given partial columns is to write the entire row. To prevent misuse, in the Merge-on-Write implementation, `INSERT INTO` maintains the semantics of whole row UPSERT by default. To enable partial column updates, set the following session variable: - -```sql -SET enable_unique_key_partial_update=true; -INSERT INTO order_tbl (order_id, order_status) VALUES (1, 'Pending Shipment'); -``` - -Note that the session variable `enable_insert_strict` defaults to true, enabling strict mode by default. In strict mode, partial column updates do not allow updating non-existent keys. To insert non-existent keys using the insert statement for partial column updates, set `enable_unique_key_partial_update` to true and `enable_insert_strict` to false. - -**Flink Connector** - -If using Flink Connector, add the following configuration: - -```sql -'sink.properties.partial_columns' = 'true', -``` - -Specify the columns to be loaded in `sink.properties.column` (must include all key columns). - -#### Update Result - -The result after the update is as follows: - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending Shipment | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -### Usage Notes - -Since the Merge-on-Write implementation needs to complete the entire row of data during writing to ensure optimal query performance, using it for partial column updates may decrease partial load performance. - -Performance optimization suggestions: - -- Use SSDs equipped with NVMe or high-speed SSD cloud disks, as completing data will read a large amount of historical data, generating high read IOPS and throughput. -- Enabling row storage can reduce the IOPS generated when completing data, significantly improving load performance. Enable row storage by setting the following property when creating a table: - -```Plain -"store_row_column" = "true" -``` - -Currently, all rows in the same batch data writing task (whether a load task or `INSERT INTO`) can only update the same columns. To update data with different columns, write in different batches. - -Future versions will support flexible column updates, allowing users to update different columns for each row in the same batch load. - -## Flexible Partial Column Updates - -> Since 3.1.0 - -Previously, Doris's partial update feature required that every row in an import update the same columns. Now, Doris supports a more flexible partial update method that allows each row in a single import to update different columns. - -:::caution Note: - -1. Currently, only the Stream Load import method and tools using Stream Load (e.g. Doris-Flink-Connector) support this feature. -2. The import file must be in JSON format when using flexible column updates. -::: - -### Applicable Scenarios - -When using CDC to synchronize data from a database system to Doris in real-time, the records output by the source system may not contain complete row data, but only the values of the primary keys and the updated columns. In such cases, the columns updated in a batch of data within a time window may differ. Flexible column updates can be used to import data into Doris. - -### Usage - -**Enabling Flexible Column Updates for Existing Tables** - -For existing Merge-On-Write tables created in old versions of Doris, after upgrading, you can enable flexible partial updates using the command: `ALTER TABLE db1.tbl1 ENABLE FEATURE "UPDATE_FLEXIBLE_COLUMNS";`. After executing this command, if the result of `show create table db1.tbl1` includes `"enable_unique_key_skip_bitmap_column" = "true"`, the feature has been successfully enabled. Ensure that the target table has the light-schema-change feature enabled beforehand. - -**Using Flexible Column Updates for New Tables** - -For new tables, to use the flexible column update feature, specify the following table properties when creating the table to enable Merge-on-Write and include the required hidden bitmap column for flexible column updates: - -```Plain -"enable_unique_key_merge_on_write" = "true" -"enable_unique_key_skip_bitmap_column" = "true" -``` - -**StreamLoad** - -When using Stream Load, add the following header: - -```Plain -unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS -``` - -**Flink Doris Connector** - -If using the Flink Doris Connector, add the following configuration: - -```Plain -'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS' -``` - -### Example - -Assuming the following table: - -```sql -CREATE TABLE t1 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true"); -``` - -The original data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+----+----+----+----+----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+----+----+----+----+----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 1 | -| 2 | 2 | 2 | 2 | 2 | 2 | -| 3 | 3 | 3 | 3 | 3 | 3 | -| 4 | 4 | 4 | 4 | 4 | 4 | -| 5 | 5 | 5 | 5 | 5 | 5 | -+---+----+----+----+----+----+ -``` - -Now, updating some fields using flexible column updates: - -```shell -$ cat test1.json -``` -```json -{"k": 0, "__DORIS_DELETE_SIGN__": 1} -{"k": 1, "v1": 10} -{"k": 2, "v2": 20, "v5": 25} -{"k": 3, "v3": 30} -{"k": 4, "v4": 20, "v1": 43, "v3": 99} -{"k": 5, "v5": null} -{"k": 6, "v1": 999, "v3": 777} -{"k": 2, "v4": 222} -{"k": 1, "v2": 111, "v3": 111} -``` -```shell -curl --location-trusted -u root: \ --H "strict_mode:false" \ --H "format:json" \ --H "read_json_by_line:true" \ --H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \ --T test1.json \ --XPUT http://<host>:<http_port>/api/d1/t1/_stream_load -``` - -After the update, the data in the table is: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+-----+------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+-----+------+-----+------+--------+ -| 1 | 10 | 111 | 111 | 1 | 1 | -| 2 | 2 | 20 | 2 | 222 | 25 | -| 3 | 3 | 3 | 30 | 3 | 3 | -| 4 | 43 | 4 | 99 | 20 | 4 | -| 5 | 5 | 5 | 5 | 5 | <null> | -| 6 | 999 | 9876 | 777 | 1234 | <null> | -+---+-----+------+-----+------+--------+ -``` - -### Limitations and Considerations - -1. Similar to previous partial updates, flexible column updates require that each row of imported data include all key columns. Rows not meeting this requirement will be filtered out and counted in filter rows. If the number of filtered rows exceeds the `max_filter_ratio` threshold for this import, the entire import will fail, and filtered data will generate an error log. - -2. In flexible partial update loads, key-value pairs in each JSON object are only valid if the key matches a column name in the target table. Key-value pairs that do not meet this requirement will be ignored. Pairs with keys `__DORIS_VERSION_COL__`, `__DORIS_ROW_STORE_COL__`, or `__DORIS_SKIP_BITMAP_COL__` will also be ignored. - -3. If the table properties of the target table include `function_column.sequence_type`, the import can specify the value for the `__DORIS_SEQUENCE_COL__` column by including a key-value pair in the JSON object with key `__DORIS_SEQUENCE_COL__`. For rows that do not specify a value for the `__DORIS_SEQUENCE_COL__`, if the key exists in the original table, the value will be filled from the old row; otherwise, it will be set to null. - -For example, for the following table: - -```sql -CREATE TABLE t2 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_type" = "int"); -``` - -The original data in the table is: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Importing data using flexible partial column updates: - -```json -{"k": 1, "v1": 111, "v5": 9, "__DORIS_SEQUENCE_COL__": 9} -{"k": 2, "v2": 222, "v5": 25, "__DORIS_SEQUENCE_COL__": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null, "__DORIS_SEQUENCE_COL__": 50} -{"k": 5, "v5": null, "__DORIS_SEQUENCE_COL__": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300, "__DORIS_SEQUENCE_COL__": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+--------+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | <null> | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+--------+ -``` - -4. When the table property `function_column.sequence_col` is set in the target table, the key-value pair in the JSON object for flexible column updates with the key `__DORIS_SEQUENCE_COL__` will be ignored. The value of the `__DORIS_SEQUENCE_COL__` column in a specific row during import will be the same of the final value of the column specified by the table property `function_column.sequence_col` for that row. - -For example, for the following table: - -```sql -CREATE TABLE t3 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL DEFAULT "31" -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_col" = "v5"); -``` - -The original data in the table is as follows: - -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -Using flexible partial updates, the following data is imported: - -```json -{"k": 1, "v1": 111, "v5": 9} -{"k": 2, "v2": 222, "v5": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null} -{"k": 5, "v5": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300} -``` - -The final data in the table is as follows: - -```sql -+---+--------+--------+-----+------+-----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+-----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | 31 | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+-----+ -``` - -5. When using flexible partial updates, the following import parameters cannot be specified or enabled: - - The `merge_type` parameter cannot be specified. - - The `delete` parameter cannot be specified. - - The `fuzzy_parse` parameter cannot be enabled. - - The `columns` parameter cannot be specified. - - The `jsonpaths` parameter cannot be specified. - - The `hidden_columns` parameter cannot be specified. - - The `function_column.sequence_col` parameter cannot be specified. - - The `sql` parameter cannot be specified. - - The `memtable_on_sink_node` option cannot be enabled. - - The `group_commit` parameter cannot be specified. - - The `where` parameter cannot be specified. - -6. Flexible partial updates are not supported on tables with Variant columns. - -7. Flexible partial updates are not supported on tables with synchronous materialized views. - - -## Handling New Rows in Partial Column Updates - -The session variable or import property `partial_update_new_key_behavior` controls the behavior when inserting new rows during partial column updates. - -When `partial_update_new_key_behavior=ERROR`, each inserted row must have a key that already exists in the table. When `partial_update_new_key_behavior=APPEND`, partial column updates can update existing rows with matching keys or insert new rows with keys that do not exist in the table. - -For example, consider the following table structure: -```sql -CREATE TABLE user_profile -( - id INT, - name VARCHAR(10), - age INT, - city VARCHAR(10), - balance DECIMAL(9, 0), - last_access_time DATETIME -) ENGINE=OLAP -UNIQUE KEY(id) -DISTRIBUTED BY HASH(id) BUCKETS 1 -PROPERTIES ( - "enable_unique_key_merge_on_write" = "true" -); -``` - -Suppose the table contains the following data: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00| -+------+-------+------+----------+---------+---------------------+ -``` - -If you use `Insert Into` for partial column updates with `partial_update_new_key_behavior=ERROR`, and try to insert the following data, the operation will fail because the keys `(3)` and `(18)` do not exist in the original table: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=ERROR; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error: [E-7003]Can't append new rows in partial update when partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table., host: 127.0.0.1") -``` - -If you use `partial_update_new_key_behavior=APPEND` and perform the same partial column update: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=APPEND; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -``` - -The existing row will be updated, and two new rows will be inserted. For columns not specified in the inserted data, if a default value is defined, the default will be used; if the column is nullable, NULL will be used; otherwise, the insert will fail. - -The query result will be: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 | -| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 | -| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 | -+------+-------+------+----------+---------+---------------------+ -``` +For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-unique-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-unique-key-model` but the file `partial-column-update.md` does not exist in the `versioned_docs/version-3.x/data-operate/update/` directory. This will create a broken link. ```suggestion For detailed information on partial column updates for Unique Key Model tables, including usage examples, flexible partial column updates, and handling new rows, please refer to the partial column update documentation. ``` ########## i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/update-of-unique-model.md: ########## @@ -14,444 +14,5 @@ ## 部分列更新 -部分列更新是指直接更新表中某些字段值,而不是全部字段值。可以使用 Update 语句进行更新,这种 Update 语句通常先读取整行数据,然后更新部分字段值,再写回。这种读写事务非常耗时,不适合大批量数据写入。Doris 在主键模型的导入更新中,提供了直接插入或更新部分列数据的功能,不需要先读取整行数据,从而大幅提升更新效率。 - -:::caution 注意 - -1. 2.0 版本仅在 Unique Key 的 Merge-on-Write 实现中支持部分列更新能力。 -2. 从 2.0.2 版本开始,支持使用 INSERT INTO 进行部分列更新。 -3. 不支持在有同步物化视图的表上进行部分列更新。 -4. 不支持在进行 Schema Change 的表上进行部分列更新。 -::: - -### 适用场景 - -- 实时动态列更新,需要在表中实时高频更新某些字段值。例如用户标签表中有一些关于用户最新行为信息的字段需要实时更新,以便广告/推荐系统能够据此进行实时分析和决策。 -- 将多张源表拼接成一张大宽表。 -- 数据修正。 - -### 使用示例 - -假设 Doris 中存在一张订单表 order_tbl,其中订单 id 是 Key 列,订单状态和订单金额是 Value 列。数据状态如下: - -| 订单 id | 订单金额 | 订单状态 | -| ------ | -------- | -------- | -| 1 | 100 | 待付款 | - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -此时,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货'。 - -#### 可以使用以下导入方式进行部分列更新 - -**StreamLoad/BrokerLoad/RoutineLoad** - -准备如下 csv 文件: - -``` -1,待发货 -``` - -在导入时添加如下 header: - -```sql -partial_columns:true -``` - -同时在 `columns` 中指定要导入的列(必须包含所有 key 列,否则无法更新)。下面是一个 Stream Load 的例子: - -```sql -curl --location-trusted -u root: -H "partial_columns:true" -H "column_separator:," -H "columns:order_id,order_status" -T /tmp/update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -**INSERT INTO** - -在所有数据模型中,`INSERT INTO` 给定部分列时默认行为是整行写入。为了防止误用,在 Merge-on-Write 实现中,`INSERT INTO` 默认仍然保持整行 UPSERT 的语义。如果需要开启部分列更新的语义,需要设置如下 session variable: - -```sql -SET enable_unique_key_partial_update=true; -INSERT INTO order_tbl (order_id, order_status) VALUES (1, '待发货'); -``` - -需要注意的是,控制 insert 语句是否开启严格模式的会话变量 `enable_insert_strict` 的默认值为 true,即 insert 语句默认开启严格模式。在严格模式下进行部分列更新不允许更新不存在的 key。所以,在使用 insert 语句进行部分列更新时,如果希望能插入不存在的 key,需要在 `enable_unique_key_partial_update` 设置为 true 的基础上,同时将 `enable_insert_strict` 设置为 false。 - -**Flink Connector** - -如果使用 Flink Connector,需要添加如下配置: - -```sql -'sink.properties.partial_columns' = 'true', -``` - -同时在 `sink.properties.column` 中指定要导入的列(必须包含所有 key 列,否则无法更新)。 - -#### 更新结果 - -更新后结果如下: - -```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待发货 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) -``` - -### 使用注意 - -由于 Merge-on-Write 实现需要在数据写入时进行整行数据的补齐,以保证最优的查询性能,因此使用 Merge-on-Write 实现进行部分列更新会导致部分导入性能下降。 - -写入性能优化建议: - -- 使用配备 NVMe 的 SSD,或者极速 SSD 云盘。因为补齐数据时会大量读取历史数据,产生较高的读 IOPS 以及读吞吐。 -- 开启行存能够大大减少补齐数据时产生的 IOPS,导入性能提升明显。用户可以在建表时通过如下 property 来开启行存: - -```Plain -"store_row_column" = "true" -``` - -目前,同一批次数据写入任务(无论是导入任务还是 `INSERT INTO`)的所有行只能更新相同的列。如果需要更新不同列的数据,则需要分不同批次进行写入。 - -## 灵活部分列更新 - -此前,doris 支持的部分列更新功能限制了一次导入中每一行必须更新相同的列。现在,doris 支持一种更加灵活的更新方式,它使得一次导入中的每一行可以更新不同的列(3.1.0版本及以上支持)。 - -:::caution 注意: - -1. 目前只有 stream load 这一种导入方式以及使用 stream load 作为其导入方式的工具 (如 doris-flink-connector) 支持灵活列更新功能 -2. 在使用灵活列更新时导入文件必须为 json 格式的数据 -::: - -### 适用场景 - -在使用 CDC 的方式将某个数据系统的数据实时同步到 Doris 中时,源端系统输出的记录可能并不是完整的行数据,而是只有主键和被更新的列的数据。在这种情况下,某个时间窗口内的一批数据中每一行更新的列可能都是不同的。此时,可以使用灵活列更新的方式来将数据导入到 Doris 中。 - -### 使用方式 - -**存量表开启灵活列更新功能** - -对于在旧版本 Doris 中已经建好的存量 Merge-On-Write 表,在升级 Doris 之后如果想要使用灵活列更新的功能,可以使用 `ALTER TALBE db1.tbl1 ENABLE FEATURE "UPDATE_FLEXIBLE_COLUMNS";` 来开启这一功能。执行完上述语句后使用 `show create table db1.tbl1` 的结果中如果包含 `"enable_unique_key_skip_bitmap_column" = "true"` 则表示功能开启成功。注意,使用这一方式之前需要确保目标表已经开启了 light-schema-change 的功能。 - -**新建表使用灵活列更新功能** - -对于新建的表,如果需要使用灵活列更新功能,建表时需要指定如下表属性,以开启 Merge-on-Write 实现,同时使得表具有灵活列更新所需要的 `bitmap` 隐藏列。 - -```Plain -"enable_unique_key_merge_on_write" = "true" -"enable_unique_key_skip_bitmap_column" = "true" -``` - -**StreamLoad** - -在使用 Stream Load 导入时添加如下 header - -```Plain -unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS -``` - -**Flink Doris Connector** - -如果使用 Flink Doris Connector, 需要添加如下配置: - -```Plain -'sink.properties.unique_key_update_mode' = 'UPDATE_FLEXIBLE_COLUMNS' -``` - -### 示例 - -假设有如下表 - -```sql -CREATE TABLE t1 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true"); -``` - -表中有如下原始数据 - -```sql -MySQL [email protected]:d1> select * from t1; -+---+----+----+----+----+----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+----+----+----+----+----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 1 | -| 2 | 2 | 2 | 2 | 2 | 2 | -| 3 | 3 | 3 | 3 | 3 | 3 | -| 4 | 4 | 4 | 4 | 4 | 4 | -| 5 | 5 | 5 | 5 | 5 | 5 | -+---+----+----+----+----+----+ -``` - -现在通过灵活列更新导入来更新其中的一些行的字段 - -```shell -$ cat test1.json -``` -```json -{"k": 0, "__DORIS_DELETE_SIGN__": 1} -{"k": 1, "v1": 10} -{"k": 2, "v2": 20, "v5": 25} -{"k": 3, "v3": 30} -{"k": 4, "v4": 20, "v1": 43, "v3": 99} -{"k": 5, "v5": null} -{"k": 6, "v1": 999, "v3": 777} -{"k": 2, "v4": 222} -{"k": 1, "v2": 111, "v3": 111} -``` -```shell -curl --location-trusted -u root: \ --H "strict_mode:false" \ --H "format:json" \ --H "read_json_by_line:true" \ --H "unique_key_update_mode:UPDATE_FLEXIBLE_COLUMNS" \ --T test1.json \ --XPUT http://<host>:<http_port>/api/d1/t1/_stream_load -``` - -更新后表中的数据如下: - -```sql -MySQL [email protected]:d1> select * from t1; -+---+-----+------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+-----+------+-----+------+--------+ -| 1 | 10 | 111 | 111 | 1 | 1 | -| 2 | 2 | 20 | 2 | 222 | 25 | -| 3 | 3 | 3 | 30 | 3 | 3 | -| 4 | 43 | 4 | 99 | 20 | 4 | -| 5 | 5 | 5 | 5 | 5 | <null> | -| 6 | 999 | 9876 | 777 | 1234 | <null> | -+---+-----+------+-----+------+--------+ -``` - -### 限制与注意事项 - -1. 和之前的部分列更新相同,灵活列更新要求导入的每一行数据需要包括所有的 Key 列,不满足这一要求的行数据将被过滤掉,同时计入 `filter rows` 的计数中,如果 `filtered rows` 的数量超过了本次导入 `max_filter_ratio` 所能容忍的上限,则整个导入将会失败。同时,被过滤的数据会在 error log 留下一条日志。 - -2. 灵活列更新导入中每一个 json 对象中的键值对只有当它的 Key 和目标表中某一列的列名一致时才是有效的,不满足这一要求的键值对将被忽略。同时,Key 为 `__DORIS_VERSION_COL__`/`__DORIS_ROW_STORE_COL__`/`__DORIS_SKIP_BITMAP_COL__` 的键值对也将被忽略。 - -3. 当目标表的表属性中设置了 `function_column.sequence_type` 这一属性时,灵活列更新的导入可以通过在 json 对象中包括 Key 为 `__DORIS_SEQUENCE_COL__` 的键值对来指定目标表中 `__DORIS_SEQUENCE_COL__` 列的值。对于不指定 `__DORIS_SEQUENCE_COL__` 列的值的行,如果这一行的 Key 在原表中存在,则这一行 `__DORIS_SEQUENCE_COL__` 列的值将被填充为旧行中对应的值,否则该列的值将被填充为 `null` 值 - -例如,对于下表: -```sql -CREATE TABLE t2 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_type" = "int"); -``` - -表中有如下原始数据: -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -通过灵活列更新导入如下数据: -```json -{"k": 1, "v1": 111, "v5": 9, "__DORIS_SEQUENCE_COL__": 9} -{"k": 2, "v2": 222, "v5": 25, "__DORIS_SEQUENCE_COL__": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null, "__DORIS_SEQUENCE_COL__": 50} -{"k": 5, "v5": null, "__DORIS_SEQUENCE_COL__": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300, "__DORIS_SEQUENCE_COL__": 300} -``` - -最终表中的数据如下: -```sql -+---+--------+--------+-----+------+--------+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+--------+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 1 | -| 5 | 5 | 5 | 5 | 5 | 5 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 3 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | <null> | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+--------+ -``` - -4. 当目标表的表属性中设置了 `function_column.sequence_col` 这一属性时,灵活列更新导入数据的 json 对象中 Key 为 `__DORIS_SEQUENCE_COL__` 的键值对将被忽略,导入中某一行 `__DORIS_SEQUENCE_COL__` 列的值将与这一行中表属性 `function_column.sequence_col` 所指定的列最终的值完全一致。 - -例如,对于下表: -```sql -CREATE TABLE t3 ( - `k` int(11) NULL, - `v1` BIGINT NULL, - `v2` BIGINT NULL DEFAULT "9876", - `v3` BIGINT NOT NULL, - `v4` BIGINT NOT NULL DEFAULT "1234", - `v5` BIGINT NULL DEFAULT "31" -) UNIQUE KEY(`k`) DISTRIBUTED BY HASH(`k`) BUCKETS 1 -PROPERTIES( -"replication_num" = "3", -"enable_unique_key_merge_on_write" = "true", -"enable_unique_key_skip_bitmap_column" = "true", -"function_column.sequence_col" = "v5"); -``` - -表中有如下原始数据: -```sql -+---+----+----+----+----+----+----------------------+ -| k | v1 | v2 | v3 | v4 | v5 |__DORIS_SEQUENCE_COL__| -+---+----+----+----+----+----+----------------------+ -| 0 | 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | 10 | -| 2 | 2 | 2 | 2 | 2 | 20 | 20 | -| 3 | 3 | 3 | 3 | 3 | 30 | 30 | -| 4 | 4 | 4 | 4 | 4 | 40 | 40 | -| 5 | 5 | 5 | 5 | 5 | 50 | 50 | -+---+----+----+----+----+----+----------------------+ -``` - -通过灵活列更新导入如下数据: -```json -{"k": 1, "v1": 111, "v5": 9} -{"k": 2, "v2": 222, "v5": 25} -{"k": 3, "v3": 333} -{"k": 4, "v4": 444, "v5": 50, "v1": 411, "v3": 433, "v2": null} -{"k": 5, "v5": null} -{"k": 6, "v1": 611, "v3": 633} -{"k": 7, "v3": 733, "v5": 300} -``` - -最终表中的数据如下: -```sql -+---+--------+--------+-----+------+-----+ -| k | v1 | v2 | v3 | v4 | v5 | -+---+--------+--------+-----+------+-----+ -| 0 | 0 | 0 | 0 | 0 | 0 | -| 1 | 1 | 1 | 1 | 1 | 10 | -| 5 | 5 | 5 | 5 | 5 | 50 | -| 2 | 2 | 222 | 2 | 2 | 25 | -| 3 | 3 | 3 | 333 | 3 | 30 | -| 4 | 411 | <null> | 433 | 444 | 50 | -| 6 | 611 | 9876 | 633 | 1234 | 31 | -| 7 | <null> | 9876 | 733 | 1234 | 300 | -+---+--------+--------+-----+------+-----+ -``` - -5. 使用灵活列更新时不能指定或开启如下一些导入属参数: - - 不能指定 `merge_type` 参数 - - 不能指定 `delete` 参数 - - 不能开启 `fuzzy_parse` 参数 - - 不能指定 `columns` 参数 - - 不能指定 `jsonpaths` 参数 - - 不能指定 `hidden_columns` 参数 - - 不能指定 `function_column.sequence_col` 参数 - - 不能指定 `sql` 参数 - - 不能开启 `memtable_on_sink_node` 前移 - - 不能指定 `group_commit` 参数 - - 不能指定 `where` 参数 - -6. 不支持在有 Variant 列的表上进行灵活列更新。 - -7. 不支持在有同步物化视图的表上进行灵活列更新。 - - -## 部分列更新/灵活列更新中对新插入的行的处理 - -session variable或导入属性`partial_update_new_key_behavior`用于控制部分列更新和灵活列更新中插入的新行的行为。 - -当`partial_update_new_key_behavior=ERROR`时,插入的每一行数据必须满足该行数据的 Key 在表中已经存在。而当`partial_update_new_key_behavior=APPEND`时,进行部分列更新或灵活列更新时可以更新 Key 已经存在的行,也可以插入 Key 不存在的新行。 - -例如有表结构如下: -```sql -CREATE TABLE user_profile -( - id INT, - name VARCHAR(10), - age INT, - city VARCHAR(10), - balance DECIMAL(9, 0), - last_access_time DATETIME -) ENGINE=OLAP -UNIQUE KEY(id) -DISTRIBUTED BY HASH(id) BUCKETS 1 -PROPERTIES ( - "enable_unique_key_merge_on_write" = "true" -); -``` - -表中有一条数据如下: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 400 | 2023-07-01 12:00:00| -+------+-------+------+----------+---------+---------------------+ -``` - -当用户在`partial_update_new_key_behavior=ERROR`的情况下使用 Insert Into 部分列更新向表中插入上述数据时,由于第二、三行的数据的 key(`(3)`, `(18)`) 不在原表中,所以本次插入会失败: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=ERROR; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -(1105, "errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]tablet error: [E-7003]Can't append new rows in partial update when partial_update_new_key_behavior is ERROR. Row with key=[3] is not in table., host: 127.0.0.1") -``` - -当用在`partial_update_new_key_behavior=APPEND`的情况下使用 Insert Into 部分列更新向表中插入如下数据时: -```sql -SET enable_unique_key_partial_update=true; -SET partial_update_new_key_behavior=APPEND; -INSERT INTO user_profile (id, balance, last_access_time) VALUES -(1, 500, '2023-07-03 12:00:01'), -(3, 23, '2023-07-03 12:00:02'), -(18, 9999999, '2023-07-03 12:00:03'); -``` - -表中原有的一条数据将会被更新,此外还向表中插入了两条新数据。对于插入的数据中用户没有指定的列,如果该列有默认值,则会以默认值填充;否则,如果该列可以为 NULL,则将以 NULL 值填充;否则本次插入不成功。 - -查询结果如下: -```sql -mysql> select * from user_profile; -+------+-------+------+----------+---------+---------------------+ -| id | name | age | city | balance | last_access_time | -+------+-------+------+----------+---------+---------------------+ -| 1 | kevin | 18 | shenzhen | 500 | 2023-07-03 12:00:01 | -| 3 | NULL | NULL | NULL | 23 | 2023-07-03 12:00:02 | -| 18 | NULL | NULL | NULL | 9999999 | 2023-07-03 12:00:03 | -+------+-------+------+----------+---------+---------------------+ -``` +关于主键模型(Unique Key Model)表的列更新详细说明,包括使用示例、灵活部分列更新和新行处理等内容,请参考[列更新](./partial-column-update.md#主键模型的列更新)文档。 Review Comment: This reference points to `./partial-column-update.md#主键模型的列更新` but the file `partial-column-update.md` does not exist in the `i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/` directory. This will create a broken link. The Chinese version of the file needs to be created. ```suggestion 关于主键模型(Unique Key Model)表的列更新详细说明,包括使用示例、灵活部分列更新和新行处理等内容,请参考《列更新》文档。 ``` ########## docs/data-operate/update/update-of-aggregate-model.md: ########## @@ -12,52 +12,6 @@ This document primarily introduces how to update the Doris Aggregate model based When loading data into the Aggregate model table using Doris-supported methods such as Stream Load, Broker Load, Routine Load, Insert Into, etc., the new values will be aggregated with the old values according to the column's aggregation function to produce new aggregated values. This value may be generated at the time of insertion or during asynchronous compaction, but users will get the same return value when querying. -## Partial Column Update of Aggregate Model +## Partial Column Update -The Aggregate table is mainly used in pre-aggregation scenarios rather than data update scenarios, but partial column updates can be achieved by setting the aggregation function to REPLACE_IF_NOT_NULL. - -**Create Table** - -Set the aggregation function of the fields that need to be updated to `REPLACE_IF_NOT_NULL`. - -```sql -CREATE TABLE order_tbl ( - order_id int(11) NULL, - order_amount int(11) REPLACE_IF_NOT_NULL NULL, - order_status varchar(100) REPLACE_IF_NOT_NULL NULL -) ENGINE=OLAP -AGGREGATE KEY(order_id) -COMMENT 'OLAP' -DISTRIBUTED BY HASH(order_id) BUCKETS 1 -PROPERTIES ( -"replication_allocation" = "tag.location.default: 1" -); -``` - -**Data Insertion** - -Whether it is Stream Load, Broker Load, Routine Load, or `INSERT INTO`, directly write the data of the fields to be updated. - -**Example** - -Similar to the previous examples, the corresponding Stream Load command is (no additional header required): - -```shell -$ cat update.csv - -1,To be shipped - -curl --location-trusted -u root: -H "column_separator:," -H "columns:order_id,order_status" -T ./update.csv http://127.0.0.1:8030/api/db1/order_tbl/_stream_load -``` - -The corresponding `INSERT INTO` statement is (no additional session variable settings required): - -```sql -INSERT INTO order_tbl (order_id, order_status) values (1,'Shipped'); -``` - -## Notes on Partial Column Updates - -The Aggregate Key model does not perform any additional processing during the write process, so the write performance is not affected and is the same as normal data load. However, the cost of aggregation during query is relatively high, and the typical aggregation query performance is 5-10 times lower than the Merge-on-Write implementation of the Unique Key model. - -Since the `REPLACE_IF_NOT_NULL` aggregation function only takes effect when the value is not NULL, users cannot change a field value to NULL. +For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to [Partial Column Update](./partial-column-update.md#partial-column-update-for-aggregate-key-model). Review Comment: This reference points to `./partial-column-update.md#partial-column-update-for-aggregate-key-model` but the file `partial-column-update.md` does not exist in the `docs/data-operate/update/` directory. This will create a broken link. ```suggestion For detailed information on partial column updates for Aggregate Key Model tables, including table creation, data insertion examples, and usage notes, please refer to the "Partial Column Update" documentation. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
