(paimon) branch master updated: [docs] Add spec demonstration for data evolution mode (#6148)

lzljs3620320 Mon, 25 Aug 2025 21:11:33 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new d62baa7e65 [docs] Add spec demonstration for data evolution mode 
(#6148)
d62baa7e65 is described below

commit d62baa7e659f0ba9257142aacb7afd2128990f71
Author: YeJunHao <41894543+leaves12...@users.noreply.github.com>
AuthorDate: Tue Aug 26 12:11:22 2025 +0800

    [docs] Add spec demonstration for data evolution mode (#6148)
---
 docs/content/append-table/data-evolution.md |  22 +++++++++++++++++++++-
 docs/static/img/data-evolution.png          | Bin 0 -> 1110138 bytes
 docs/static/img/data-evolution2.png         | Bin 0 -> 1370787 bytes
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/docs/content/append-table/data-evolution.md 
b/docs/content/append-table/data-evolution.md
index 41da4a3eca..7d3ff96a11 100644
--- a/docs/content/append-table/data-evolution.md
+++ b/docs/content/append-table/data-evolution.md
@@ -77,8 +77,28 @@ SELECT * FROM my_table;
 ```
 
 This statement updates only the `b` column in the target table `target_table` 
based on the matching records from the source table
-`source_table`. The `id` column and `c` column remain unchanged, and new 
records are inserted with the specified values.
+`source_table`. The `id` column and `c` column remain unchanged, and new 
records are inserted with the specified values. The difference between this and 
table those are not enabled with data evolution is that only the `b` column 
data is written to new files.
 
 Note that: 
 * Data Evolution Table does not support 'Delete' statement yet.
 * Merge Into for Data Evolution Table does not support 'WHEN NOT MATCHED BY 
SOURCE' clause.
+
+## Spec
+
+When writing: MERGE INTO clause for Data Evolution Table only updates the 
specified columns, and writes the updated column data to new files. The 
original data files remain unchanged.
+
+When reading: Paimon reads both the original data files and the new files 
containing the updated column data. It then merges the data from these two 
sources to present a unified view of the table. This merging process is 
optimized to ensure that read performance is not significantly impacted.
+
+After writing, the files in `target_table` like below:
+
+{{< img src="/img/data-evolution.png">}}
+
+When reading, the files with the same `first row id` will merge fields.
+
+{{< img src="/img/data-evolution2.png">}}
+
+The advantage to the mode is:
+
+* Avoid rewriting the whole file when updating partial columns, reducing I/O 
cost.
+* The read performance is not significantly impacted, as the merge process is 
optimized.
+* The disk space is used more efficiently, as only the updated columns are 
written to new files.
\ No newline at end of file
diff --git a/docs/static/img/data-evolution.png 
b/docs/static/img/data-evolution.png
new file mode 100644
index 0000000000..c70b91da4d
Binary files /dev/null and b/docs/static/img/data-evolution.png differ
diff --git a/docs/static/img/data-evolution2.png 
b/docs/static/img/data-evolution2.png
new file mode 100644
index 0000000000..a84fc85abb
Binary files /dev/null and b/docs/static/img/data-evolution2.png differ

(paimon) branch master updated: [docs] Add spec demonstration for data evolution mode (#6148)

Reply via email to