This is an automated email from the ASF dual-hosted git repository.
szehon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/main by this push:
new 319f29ea86 Docs: Add examples for DataFrame branch writes (#10644)
319f29ea86 is described below
commit 319f29ea860e42e7cc21cda8c05d882134e6431f
Author: Anurag Mantripragada <[email protected]>
AuthorDate: Wed Jul 17 23:37:10 2024 +0530
Docs: Add examples for DataFrame branch writes (#10644)
---
docs/docs/spark-writes.md | 31 +++++++++++++++++++++++++------
1 file changed, 25 insertions(+), 6 deletions(-)
diff --git a/docs/docs/spark-writes.md b/docs/docs/spark-writes.md
index 96fcc5f7ce..cc8ca76fe5 100644
--- a/docs/docs/spark-writes.md
+++ b/docs/docs/spark-writes.md
@@ -195,16 +195,19 @@ WHERE EXISTS (SELECT oid FROM prod.db.returned_orders
WHERE t1.oid = oid)
For more complex row-level updates based on incoming data, see the section on
`MERGE INTO`.
## Writing to Branches
-Branch writes can be performed via SQL by providing a branch identifier,
`branch_yourBranch` in the operation.
-Branch writes can also be performed as part of a write-audit-publish (WAP)
workflow by specifying the `spark.wap.branch` config.
-Note WAP branch and branch identifier cannot both be specified.
-Also, the branch must exist before performing the write.
-The operation does **not** create the branch if it does not exist.
-For more information on branches please refer to [branches](branching.md).
+
+The branch must exist before performing write. Operations do **not** create
the branch if it does not exist.
+A branch can be created using [Spark
DDL](spark-ddl.md#branching-and-tagging-ddl).
!!! info
Note: When writing to a branch, the current schema of the table will be
used for validation.
+### Via SQL
+
+Branch writes can be performed by providing a branch identifier,
`branch_yourBranch` in the operation.
+
+Branch writes can also be performed as part of a write-audit-publish (WAP)
workflow by specifying the `spark.wap.branch` config.
+Note WAP branch and branch identifier cannot both be specified.
```sql
-- INSERT (1,' a') (2, 'b') into the audit branch.
@@ -228,6 +231,22 @@ SET spark.wap.branch = audit-branch
INSERT INTO prod.db.table VALUES (3, 'c');
```
+### Via DataFrames
+
+Branch writes via DataFrames can be performed by providing a branch
identifier, `branch_yourBranch` in the operation.
+
+```scala
+// To insert into `audit` branch
+val data: DataFrame = ...
+data.writeTo("prod.db.table.branch_audit").append()
+```
+
+```scala
+// To overwrite `audit` branch
+val data: DataFrame = ...
+data.writeTo("prod.db.table.branch_audit").overwritePartitions()
+```
+
## Writing with DataFrames
Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using
data frames. The v2 API is recommended for several reasons: