(hudi) branch asf-site updated: chore(site): update notebooks (#14166)

xushiyan Mon, 27 Oct 2025 14:44:04 -0700

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 8747433cbc5f chore(site): update notebooks (#14166)
8747433cbc5f is described below

commit 8747433cbc5f3e6a2bd90a4746b8c0df00b1d6e4
Author: Shiyan Xu <[email protected]>
AuthorDate: Mon Oct 27 14:43:51 2025 -0700

    chore(site): update notebooks (#14166)
---
 website/docs/notebooks.md                          | 74 +++++++++++++---------
 .../version-1.0.2}/notebooks.md                    | 74 +++++++++++++---------
 .../versioned_sidebars/version-1.0.2-sidebars.json |  4 +-
 3 files changed, 88 insertions(+), 64 deletions(-)

diff --git a/website/docs/notebooks.md b/website/docs/notebooks.md
index 07fd2304353f..46a123b7bfed 100644
--- a/website/docs/notebooks.md
+++ b/website/docs/notebooks.md
@@ -1,6 +1,6 @@
 ---
 title: "Notebooks"
-keywords: [ hudi, notebooks]
+keywords: [ hudi, notebooks ]
 toc: true
 last_modified_at: 2025-10-09T19:13:57+08:00
 ---
@@ -13,62 +13,74 @@ All you need is a cloned copy of the Hudi repository and 
Docker installed on you
 
 ### Setup
 
-  * Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
-  * Docker Setup :  For Mac, Please follow the steps as defined in [Install 
Docker Desktop on Mac](https://docs.docker.com/desktop/install/mac-install/). 
For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are 
allocated to Docker (See Docker -> Preferences -> Advanced). 
-  * This setup also needs JDK 8 and maven installed on your system.
-  * Build Docker Images
-    ```sh 
-    cd hudi-notebooks
-    sh build.sh
-    ```
-  * Start the Environment
-    ```sh
-    sh run_spark_hudi.sh start
-    ```
+* Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
+* Docker Setup: For macOS, follow the steps in [Install Docker Desktop on 
Mac](https://docs.docker.com/desktop/install/mac-install/). For Spark SQL 
queries, ensure at least 6 GB of memory and 4 CPUs are allocated to Docker (see 
Docker > Preferences > Advanced).
+* Build Docker Images
+
+  ```shell
+  # under Hudi repo root dir
+  cd hudi-notebooks
+  sh build.sh
+  ```
+
+* Start the Environment
+
+  ```shell
+  sh run_spark_hudi.sh start
+  ```
 
 ### Meet Your Notebooks
+
 #### 1 - Getting Started with Apache Hudi: A Hands-On Guide to CRUD Operations
-This notebook is a beginner friendly, practical guide to working with Apache 
Hudi using PySpark. It walks you through the essential CRUD operations (Create, 
Read, Update, Delete) on Hudi tables, while also helping you understand key 
table types such as Copy-On-Write (COW) and Merge-On-Read (MOR).
 
-For storage, we use MinIO as an S3-compatible backend, simulating a modern 
datalake setup.
+This notebook is a beginner-friendly, practical guide to working with Apache 
Hudi using PySpark. It walks you through the essential CRUD operations (Create, 
Read, Update, Delete) on Hudi tables, while also helping you understand key 
table types such as Copy-On-Write (COW) and Merge-On-Read (MOR).
+
+For storage, we use MinIO as an S3-compatible backend.
 
 **What you will learn:**
-- How to create and update Hudi tables using PySpark
-- The difference between COW and MOR tables
-- Reading data using snapshot and incremental queries
-- How Hudi handles upserts and deletes
+
+* How to create and update Hudi tables using PySpark
+* The difference between COW and MOR tables
+* Reading data using snapshot and incremental queries
+* How Hudi handles upserts and deletes
 
 #### 2 - Deep Dive into Apache Hudi Table & Query Types: Snapshot, RO, 
Incremental, Time Travel, CDC
+
 This notebook is your hands-on guide to mastering Apache Hudi's advanced query 
capabilities. You will explore practical examples of various read modes such as 
Snapshot, Read-Optimized (RO), Incremental, Time Travel, and Change Data 
Capture (CDC) so you can understand when and how to use each for building 
efficient, real-world data pipelines.
 
 **What you will learn:**
-- How to perform Snapshot and Read-Optimized queries
-- Using Incremental pulls for near real-time data processing
-- Querying historical data with Time Travel
-- Capturing changes with CDC for downstream consumption
+
+* How to perform Snapshot and Read-Optimized queries
+* Using incremental pulls for near-real-time data processing
+* Querying historical data with Time Travel
+* Capturing changes with CDC for downstream consumption
 
 #### 3 - Implementing Slowly Changing Dimensions (SCD Type 2 & 4) with Apache 
Hudi
+
 Dive into this practical guide on implementing two key data warehousing 
patterns - Slowly Changing Dimensions (SCD) Type 2 and Type 4 using Apache Hudi.
 
-SCDs help track changes in dimension data over time without losing historical 
context. Instead of overwriting records, these patterns let you maintain a full 
history of data changes. Leveraging Hudi's upsert capabilities and rich 
metadata, this notebook simplifies what's traditionally a complex process.
+SCDs help track changes in dimension data over time without losing historical 
context. Instead of overwriting records, these patterns let you maintain a full 
history of data changes. Leveraging Hudi's upsert capabilities and rich 
metadata, this notebook simplifies what is traditionally a complex process.
 
 **What you will learn:**
-- SCD Type 2: How to track changes by adding new rows to your dimension tables
-- SCD Type 4: How to manage historical data in a separate history table
+
+* SCD Type 2: How to track changes by adding new rows to your dimension tables
+* SCD Type 4: How to manage historical data in a separate history table
 
 #### 4 - Schema Evolution with Apache Hudi: Concepts and Practical Use
-In real-world data lake environments, schema changes are not just common but 
they are expected. Whether you are adding new data attributes, adjusting 
existing types, or refactoring nested structures, it's essential that your 
pipelines adapt without introducing instability.
+
+In real-world data lakehouse environments, schema changes are not just 
common—they are expected. Whether you are adding new data attributes, adjusting 
existing types, or refactoring nested structures, it is essential that your 
pipelines adapt without introducing instability.
 
 Apache Hudi supports powerful schema evolution capabilities that help you 
maintain schema flexibility while ensuring data consistency. In this notebook, 
we will explore how Hudi enables safe and efficient schema changes, both at 
write time and read time.
 
 **What you will learn:**
-- Schema Evolution on Write:
+
+* Schema Evolution on Write:
 Apache Hudi allows safe, backward-compatible schema changes during write 
operations. This ensures that you can evolve your schema without rewriting 
existing data or breaking your ingestion pipelines.
+* Schema Evolution on Read:
+Hudi also supports schema evolution during reads, enabling more flexible 
transformations that do not require rewriting the dataset.
 
-- Schema Evolution on Read:
-Hudi also supports schema evolution during reads, enabling more flexible 
transformations that don't require rewriting the dataset.
+#### 5 - A Hands-On Guide to Hudi SQL Procedures
 
-#### 5 - A Hands-on Guide to Hudi SQL Procedures
 Apache Hudi provides a suite of powerful built-in procedures that can be 
executed directly from Spark SQL using the familiar CALL syntax.
 
 These procedures enable you to perform advanced table maintenance, auditing, 
and data management tasks without writing any custom code or scripts. Whether 
you are compacting data, cleaning old versions, or retrieving metadata, Hudi 
SQL procedures make it easy and SQL-friendly.
diff --git a/website/docs/notebooks.md 
b/website/versioned_docs/version-1.0.2/notebooks.md
similarity index 64%
copy from website/docs/notebooks.md
copy to website/versioned_docs/version-1.0.2/notebooks.md
index 07fd2304353f..46a123b7bfed 100644
--- a/website/docs/notebooks.md
+++ b/website/versioned_docs/version-1.0.2/notebooks.md
@@ -1,6 +1,6 @@
 ---
 title: "Notebooks"
-keywords: [ hudi, notebooks]
+keywords: [ hudi, notebooks ]
 toc: true
 last_modified_at: 2025-10-09T19:13:57+08:00
 ---
@@ -13,62 +13,74 @@ All you need is a cloned copy of the Hudi repository and 
Docker installed on you
 
 ### Setup
 
-  * Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
-  * Docker Setup :  For Mac, Please follow the steps as defined in [Install 
Docker Desktop on Mac](https://docs.docker.com/desktop/install/mac-install/). 
For running Spark-SQL queries, please ensure atleast 6 GB and 4 CPUs are 
allocated to Docker (See Docker -> Preferences -> Advanced). 
-  * This setup also needs JDK 8 and maven installed on your system.
-  * Build Docker Images
-    ```sh 
-    cd hudi-notebooks
-    sh build.sh
-    ```
-  * Start the Environment
-    ```sh
-    sh run_spark_hudi.sh start
-    ```
+* Clone the [Hudi repository](https://github.com/apache/hudi) to your local 
machine.
+* Docker Setup: For macOS, follow the steps in [Install Docker Desktop on 
Mac](https://docs.docker.com/desktop/install/mac-install/). For Spark SQL 
queries, ensure at least 6 GB of memory and 4 CPUs are allocated to Docker (see 
Docker > Preferences > Advanced).
+* Build Docker Images
+
+  ```shell
+  # under Hudi repo root dir
+  cd hudi-notebooks
+  sh build.sh
+  ```
+
+* Start the Environment
+
+  ```shell
+  sh run_spark_hudi.sh start
+  ```
 
 ### Meet Your Notebooks
+
 #### 1 - Getting Started with Apache Hudi: A Hands-On Guide to CRUD Operations
-This notebook is a beginner friendly, practical guide to working with Apache 
Hudi using PySpark. It walks you through the essential CRUD operations (Create, 
Read, Update, Delete) on Hudi tables, while also helping you understand key 
table types such as Copy-On-Write (COW) and Merge-On-Read (MOR).
 
-For storage, we use MinIO as an S3-compatible backend, simulating a modern 
datalake setup.
+This notebook is a beginner-friendly, practical guide to working with Apache 
Hudi using PySpark. It walks you through the essential CRUD operations (Create, 
Read, Update, Delete) on Hudi tables, while also helping you understand key 
table types such as Copy-On-Write (COW) and Merge-On-Read (MOR).
+
+For storage, we use MinIO as an S3-compatible backend.
 
 **What you will learn:**
-- How to create and update Hudi tables using PySpark
-- The difference between COW and MOR tables
-- Reading data using snapshot and incremental queries
-- How Hudi handles upserts and deletes
+
+* How to create and update Hudi tables using PySpark
+* The difference between COW and MOR tables
+* Reading data using snapshot and incremental queries
+* How Hudi handles upserts and deletes
 
 #### 2 - Deep Dive into Apache Hudi Table & Query Types: Snapshot, RO, 
Incremental, Time Travel, CDC
+
 This notebook is your hands-on guide to mastering Apache Hudi's advanced query 
capabilities. You will explore practical examples of various read modes such as 
Snapshot, Read-Optimized (RO), Incremental, Time Travel, and Change Data 
Capture (CDC) so you can understand when and how to use each for building 
efficient, real-world data pipelines.
 
 **What you will learn:**
-- How to perform Snapshot and Read-Optimized queries
-- Using Incremental pulls for near real-time data processing
-- Querying historical data with Time Travel
-- Capturing changes with CDC for downstream consumption
+
+* How to perform Snapshot and Read-Optimized queries
+* Using incremental pulls for near-real-time data processing
+* Querying historical data with Time Travel
+* Capturing changes with CDC for downstream consumption
 
 #### 3 - Implementing Slowly Changing Dimensions (SCD Type 2 & 4) with Apache 
Hudi
+
 Dive into this practical guide on implementing two key data warehousing 
patterns - Slowly Changing Dimensions (SCD) Type 2 and Type 4 using Apache Hudi.
 
-SCDs help track changes in dimension data over time without losing historical 
context. Instead of overwriting records, these patterns let you maintain a full 
history of data changes. Leveraging Hudi's upsert capabilities and rich 
metadata, this notebook simplifies what's traditionally a complex process.
+SCDs help track changes in dimension data over time without losing historical 
context. Instead of overwriting records, these patterns let you maintain a full 
history of data changes. Leveraging Hudi's upsert capabilities and rich 
metadata, this notebook simplifies what is traditionally a complex process.
 
 **What you will learn:**
-- SCD Type 2: How to track changes by adding new rows to your dimension tables
-- SCD Type 4: How to manage historical data in a separate history table
+
+* SCD Type 2: How to track changes by adding new rows to your dimension tables
+* SCD Type 4: How to manage historical data in a separate history table
 
 #### 4 - Schema Evolution with Apache Hudi: Concepts and Practical Use
-In real-world data lake environments, schema changes are not just common but 
they are expected. Whether you are adding new data attributes, adjusting 
existing types, or refactoring nested structures, it's essential that your 
pipelines adapt without introducing instability.
+
+In real-world data lakehouse environments, schema changes are not just 
common—they are expected. Whether you are adding new data attributes, adjusting 
existing types, or refactoring nested structures, it is essential that your 
pipelines adapt without introducing instability.
 
 Apache Hudi supports powerful schema evolution capabilities that help you 
maintain schema flexibility while ensuring data consistency. In this notebook, 
we will explore how Hudi enables safe and efficient schema changes, both at 
write time and read time.
 
 **What you will learn:**
-- Schema Evolution on Write:
+
+* Schema Evolution on Write:
 Apache Hudi allows safe, backward-compatible schema changes during write 
operations. This ensures that you can evolve your schema without rewriting 
existing data or breaking your ingestion pipelines.
+* Schema Evolution on Read:
+Hudi also supports schema evolution during reads, enabling more flexible 
transformations that do not require rewriting the dataset.
 
-- Schema Evolution on Read:
-Hudi also supports schema evolution during reads, enabling more flexible 
transformations that don't require rewriting the dataset.
+#### 5 - A Hands-On Guide to Hudi SQL Procedures
 
-#### 5 - A Hands-on Guide to Hudi SQL Procedures
 Apache Hudi provides a suite of powerful built-in procedures that can be 
executed directly from Spark SQL using the familiar CALL syntax.
 
 These procedures enable you to perform advanced table maintenance, auditing, 
and data management tasks without writing any custom code or scripts. Whether 
you are compacting data, cleaning old versions, or retrieving metadata, Hudi 
SQL procedures make it easy and SQL-friendly.
diff --git a/website/versioned_sidebars/version-1.0.2-sidebars.json 
b/website/versioned_sidebars/version-1.0.2-sidebars.json
index 51c1310df69e..b8634292d6ab 100644
--- a/website/versioned_sidebars/version-1.0.2-sidebars.json
+++ b/website/versioned_sidebars/version-1.0.2-sidebars.json
@@ -10,6 +10,7 @@
         "flink-quick-start-guide",
         "python-rust-quick-start-guide",
         "docker_demo",
+        "notebooks",
         "use_cases"
       ]
     },
@@ -131,8 +132,7 @@
           ]
         }
       ]
-    },
-    "privacy"
+    }
   ],
   "quick_links": [
     {

(hudi) branch asf-site updated: chore(site): update notebooks (#14166)

Reply via email to