This is an automated email from the ASF dual-hosted git repository. lzljs3620320 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/paimon.git
The following commit(s) were added to refs/heads/master by this push: new d62baa7e65 [docs] Add spec demonstration for data evolution mode (#6148) d62baa7e65 is described below commit d62baa7e659f0ba9257142aacb7afd2128990f71 Author: YeJunHao <41894543+leaves12...@users.noreply.github.com> AuthorDate: Tue Aug 26 12:11:22 2025 +0800 [docs] Add spec demonstration for data evolution mode (#6148) --- docs/content/append-table/data-evolution.md | 22 +++++++++++++++++++++- docs/static/img/data-evolution.png | Bin 0 -> 1110138 bytes docs/static/img/data-evolution2.png | Bin 0 -> 1370787 bytes 3 files changed, 21 insertions(+), 1 deletion(-) diff --git a/docs/content/append-table/data-evolution.md b/docs/content/append-table/data-evolution.md index 41da4a3eca..7d3ff96a11 100644 --- a/docs/content/append-table/data-evolution.md +++ b/docs/content/append-table/data-evolution.md @@ -77,8 +77,28 @@ SELECT * FROM my_table; ``` This statement updates only the `b` column in the target table `target_table` based on the matching records from the source table -`source_table`. The `id` column and `c` column remain unchanged, and new records are inserted with the specified values. +`source_table`. The `id` column and `c` column remain unchanged, and new records are inserted with the specified values. The difference between this and table those are not enabled with data evolution is that only the `b` column data is written to new files. Note that: * Data Evolution Table does not support 'Delete' statement yet. * Merge Into for Data Evolution Table does not support 'WHEN NOT MATCHED BY SOURCE' clause. + +## Spec + +When writing: MERGE INTO clause for Data Evolution Table only updates the specified columns, and writes the updated column data to new files. The original data files remain unchanged. + +When reading: Paimon reads both the original data files and the new files containing the updated column data. It then merges the data from these two sources to present a unified view of the table. This merging process is optimized to ensure that read performance is not significantly impacted. + +After writing, the files in `target_table` like below: + +{{< img src="/img/data-evolution.png">}} + +When reading, the files with the same `first row id` will merge fields. + +{{< img src="/img/data-evolution2.png">}} + +The advantage to the mode is: + +* Avoid rewriting the whole file when updating partial columns, reducing I/O cost. +* The read performance is not significantly impacted, as the merge process is optimized. +* The disk space is used more efficiently, as only the updated columns are written to new files. \ No newline at end of file diff --git a/docs/static/img/data-evolution.png b/docs/static/img/data-evolution.png new file mode 100644 index 0000000000..c70b91da4d Binary files /dev/null and b/docs/static/img/data-evolution.png differ diff --git a/docs/static/img/data-evolution2.png b/docs/static/img/data-evolution2.png new file mode 100644 index 0000000000..a84fc85abb Binary files /dev/null and b/docs/static/img/data-evolution2.png differ