This is an automated email from the ASF dual-hosted git repository.
bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 1315c99db2f8 Fixing merging data section in quick start (#14288)
1315c99db2f8 is described below
commit 1315c99db2f87ff1d7d84ea6596086daa3fc6f81
Author: Sivabalan Narayanan <[email protected]>
AuthorDate: Fri Nov 14 17:25:35 2025 -0800
Fixing merging data section in quick start (#14288)
---
website/docs/quick-start-guide.md | 48 +++++----------------------------------
1 file changed, 6 insertions(+), 42 deletions(-)
diff --git a/website/docs/quick-start-guide.md
b/website/docs/quick-start-guide.md
index 86381e7fe5ad..58d62a237b73 100644
--- a/website/docs/quick-start-guide.md
+++ b/website/docs/quick-start-guide.md
@@ -462,39 +462,16 @@ values={[
```scala
// spark-shell
-val adjustedFareDF = spark.read.format("hudi").
- load(basePath).limit(2).
- withColumn("fare", col("fare") * 10)
-
-adjustedFareDF.write.format("hudi").
-
option("hoodie.datasource.write.payload.class","com.payloads.CustomMergeIntoConnector").
- mode(Append).
- save(basePath)
-// Notice Fare column has been updated but all other columns remain intact.
-spark.read.format("hudi").load(basePath).show()
+Feel free to use "upsert" operation as showed under "Update data" section. Or
leverage MergeInto with Spark sql writes.
```
-The `com.payloads.CustomMergeIntoConnector` adds adjusted fare values to the
original table and preserves all other fields.
-Refer
[here](https://gist.github.com/bhasudha/7ea07f2bb9abc5c6eb86dbd914eec4c6) for
sample implementation of `com.payloads.CustomMergeIntoConnector`.
-
</TabItem>
<TabItem value="python">
```python
# pyspark
-adjustedFareDF = spark.read.format("hudi").load(basePath). \
- limit(2).withColumn("fare", col("fare") * 100)
-adjustedFareDF.write.format("hudi"). \
-option("hoodie.datasource.write.payload.class","com.payloads.CustomMergeIntoConnector").
\
-mode("append"). \
-save(basePath)
-# Notice Fare column has been updated but all other columns remain intact.
-spark.read.format("hudi").load(basePath).show()
+Feel free to use "upsert" operation as showed under "Update data" section. Or
leverage MergeInto with Spark sql writes.
```
-
-The `com.payloads.CustomMergeIntoConnector` adds adjusted fare values to the
original table and preserves all other fields.
-Refer
[here](https://gist.github.com/bhasudha/7ea07f2bb9abc5c6eb86dbd914eec4c6) for
sample implementation of `com.payloads.CustomMergeIntoConnector`.
-
</TabItem>
<TabItem value="sparksql">
@@ -519,6 +496,10 @@ WHEN NOT MATCHED THEN INSERT *
```
+Partial updates only write updated columns instead of full update record. This
is useful when you have hundreds of columns
+and only a few columns are updated. It reduces the write costs as well as
storage costs. Note that when the condition is
+matched, we only update fare column.
+
:::info Key requirements
1. For a Hudi table with user defined primary record [keys](#keys), the join
condition is expected to contain the primary keys of the table.
For a Hudi table with Hudi generated primary keys, the join condition can be
on any arbitrary data columns.
@@ -526,23 +507,6 @@ For a Hudi table with Hudi generated primary keys, the
join condition can be on
</TabItem>
</Tabs>
-## Merging Data (Partial Updates) {#merge-partial-update}
-
-Partial updates only write updated columns instead of full update record. This
is useful when you have hundreds of
-columns and only a few columns are updated. It reduces the write costs as well
as storage costs.
-`MERGE INTO` statement above can be modified to use partial updates as shown
below.
-
-```sql
-MERGE INTO hudi_table AS target
-USING fare_adjustment AS source
-ON target.uuid = source.uuid
-WHEN MATCHED THEN UPDATE SET fare = source.fare
-WHEN NOT MATCHED THEN INSERT *
-;
-```
-
-Notice, instead of `UPDATE SET *`, we are updating only the `fare` column.
-
## Delete data {#deletes}
Delete operation removes the records specified from the table. For example,
this code snippet deletes records