This is an automated email from the ASF dual-hosted git repository.
russellspitzer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/master by this push:
new 11a708af9d Docs: Spark Schema Merge docs (#8528)
11a708af9d is described below
commit 11a708af9d3417a1840968f46b231248b3388018
Author: Andrea Campolonghi <[email protected]>
AuthorDate: Thu Sep 14 18:31:49 2023 +0200
Docs: Spark Schema Merge docs (#8528)
---
docs/spark-writes.md | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/docs/spark-writes.md b/docs/spark-writes.md
index ea62c4b333..db641fc9b9 100644
--- a/docs/spark-writes.md
+++ b/docs/spark-writes.md
@@ -313,6 +313,33 @@ data.writeTo("prod.db.table")
.createOrReplace()
```
+### Schema Merge
+
+While inserting or updating Iceberg is capable of resolving schema mismatch at
runtime. If configured, Iceberg will perform an automatic schema evolution as
follows:
+
+
+* A new column is present in the source but not in the target table.
+
+ The new column is added to the target table. Column values are set to `NULL`
in all the rows already present in the table
+
+* A column is present in the target but not in the source.
+
+ The target column value is set to `NULL` when inserting or left unchanged
when updating the row.
+
+The target table must be configured to accept any schema change by setting the
property `write.spark.accept-any-schema` to `true`.
+
+```sql
+ALTER TABLE prod.db.sample SET TBLPROPERTIES (
+ 'write.spark.accept-any-schema'='true'
+)
+```
+The writer must enable the `mergeSchema` option.
+
+```scala
+data.writeTo("prod.db.sample").option("mergeSchema","true").append()
+```
+
+
## Writing Distribution Modes
Iceberg's default Spark writers require that the data in each spark task is
clustered by partition values. This