(iceberg) branch main updated: Docs: Add examples for DataFrame branch writes (#10644)

szehon Wed, 17 Jul 2024 11:07:21 -0700

This is an automated email from the ASF dual-hosted git repository.

szehon pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/main by this push:
     new 319f29ea86 Docs: Add examples for DataFrame branch writes (#10644)
319f29ea86 is described below

commit 319f29ea860e42e7cc21cda8c05d882134e6431f
Author: Anurag Mantripragada <[email protected]>
AuthorDate: Wed Jul 17 23:37:10 2024 +0530

    Docs: Add examples for DataFrame branch writes (#10644)
---
 docs/docs/spark-writes.md | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/docs/docs/spark-writes.md b/docs/docs/spark-writes.md
index 96fcc5f7ce..cc8ca76fe5 100644
--- a/docs/docs/spark-writes.md
+++ b/docs/docs/spark-writes.md
@@ -195,16 +195,19 @@ WHERE EXISTS (SELECT oid FROM prod.db.returned_orders 
WHERE t1.oid = oid)
 For more complex row-level updates based on incoming data, see the section on 
`MERGE INTO`.
 
 ## Writing to Branches
-Branch writes can be performed via SQL by providing a branch identifier, 
`branch_yourBranch` in the operation.
-Branch writes can also be performed as part of a write-audit-publish (WAP) 
workflow by specifying the `spark.wap.branch` config.
-Note WAP branch and branch identifier cannot both be specified.
-Also, the branch must exist before performing the write. 
-The operation does **not** create the branch if it does not exist. 
-For more information on branches please refer to [branches](branching.md).
+
+The branch must exist before performing write. Operations do **not** create 
the branch if it does not exist.
+A branch can be created using [Spark 
DDL](spark-ddl.md#branching-and-tagging-ddl).
 
 !!! info
     Note: When writing to a branch, the current schema of the table will be 
used for validation.
 
+### Via SQL
+
+Branch writes can be performed by providing a branch identifier, 
`branch_yourBranch` in the operation.
+
+Branch writes can also be performed as part of a write-audit-publish (WAP) 
workflow by specifying the `spark.wap.branch` config.
+Note WAP branch and branch identifier cannot both be specified.
  
 ```sql
 -- INSERT (1,' a') (2, 'b') into the audit branch.
@@ -228,6 +231,22 @@ SET spark.wap.branch = audit-branch
 INSERT INTO prod.db.table VALUES (3, 'c');
 ```
 
+### Via DataFrames
+
+Branch writes via DataFrames can be performed by providing a branch 
identifier, `branch_yourBranch` in the operation.
+
+```scala
+// To insert into `audit` branch
+val data: DataFrame = ...
+data.writeTo("prod.db.table.branch_audit").append()
+```
+
+```scala
+// To overwrite `audit` branch
+val data: DataFrame = ...
+data.writeTo("prod.db.table.branch_audit").overwritePartitions()
+```
+
 ## Writing with DataFrames
 
 Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using 
data frames. The v2 API is recommended for several reasons:

(iceberg) branch main updated: Docs: Add examples for DataFrame branch writes (#10644)

Reply via email to