(incubator-paimon) branch master updated: [spark][docs] doc for spark schema evolution (#2337)

lzljs3620320 Thu, 16 Nov 2023 23:27:34 -0800

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-paimon.git



The following commit(s) were added to refs/heads/master by this push:
     new 211d74bfa [spark][docs] doc for spark schema evolution (#2337)
211d74bfa is described below

commit 211d74bfabac2dc85e62e0349b312741192ce47c
Author: Yann Byron <[email protected]>
AuthorDate: Fri Nov 17 15:26:53 2023 +0800

    [spark][docs] doc for spark schema evolution (#2337)
---
 docs/content/engines/spark3.md | 56 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/docs/content/engines/spark3.md b/docs/content/engines/spark3.md
index 21af9d565..ade2ebc74 100644
--- a/docs/content/engines/spark3.md
+++ b/docs/content/engines/spark3.md
@@ -425,6 +425,62 @@ val query = spark.readStream
 */
 ```
 
+## Schema Evolution
+
+Schema evolution is a feature that allows users to easily modify the current 
schema of a table to adapt to existing data, or new data that changes over 
time, while maintaining data integrity and consistency.
+
+Paimon supports automatic schema merging of source data and current table data 
while data is being written, and uses the merged schema as the latest schema of 
the table, and it only requires configuring `write.merge-schema`.
+
+```scala
+data.write
+  .format("paimon")
+  .mode("append")
+  .option("write.merge-schema", "true")
+  .save(location)
+```
+
+When enable `write.merge-schema`, Paimon can allow users to perform the 
following actions on table schema by default:
+- Adding columns
+- Up-casting the type of column(e.g. Int -> Long)
+
+Paimon also supports explicit type conversions between certain types (e.g. 
String -> Date, Long -> Int), it requires an explicit configuration 
`write.merge-schema.explicit-cast`.
+
+Schema evolution can be used in streaming mode at the same time.
+
+```scala
+val inputData = MemoryStream[(Int, String)]
+inputData
+  .toDS()
+  .toDF("col1", "col2")
+  .writeStream
+  .format("paimon")
+  .option("checkpointLocation", "/path/to/checkpoint")
+  .option("write.merge-schema", "true")
+  .option("write.merge-schema.explicit-cast", "true")
+  .start(location)
+```
+
+Here list the configurations.
+
+<table class="configuration table table-bordered">
+    <thead>
+        <tr>
+            <th class="text-left" style="width: 20%">Scan Mode</th>
+            <th class="text-left" style="width: 60%">Description</th>
+        </tr>
+    </thead>
+    <tbody>
+        <tr>
+            <td><h5>write.merge-schema</h5></td>
+            <td>If true, merge the data schema and the table schema 
automatically before write data.</td>
+        </tr>
+        <tr>
+            <td><h5>write.merge-schema.explicit-cast</h5></td>
+            <td>If true, allow to merge data types if the two types meet the 
rules for explicit casting.</td>
+        </tr>
+    </tbody>
+</table>
+
 ## Spark Procedure
 
 This section introduce all available spark procedures about paimon.

(incubator-paimon) branch master updated: [spark][docs] doc for spark schema evolution (#2337)

Reply via email to