rangareddy opened a new pull request, #18939:
URL: https://github.com/apache/hudi/pull/18939

   ### Describe the issue this Pull Request addresses
   
   The `hudi-notebooks` demo environment only shipped a Spark 3 image. This 
adds a parallel Spark 4 image and improves the existing notebooks. Changes are 
scoped entirely to the `hudi-notebooks` module (a Docker Compose demo); no 
production code or public APIs are touched.
   
   ### Summary and Changelog
   
   **Spark 4 stack**
   - `Dockerfile.spark4` — Spark 4.0.2 / Scala 2.13 / Java 17 / Hudi 1.1.1. 
Uses the `hudi-spark4.0-bundle_2.13` bundle and AWS SDK v2 
(`software.amazon.awssdk:bundle`), since Hadoop 3.4.x migrated S3A off the v1 
SDK.
   - `conf/spark4/`, `requirements-spark4.txt` (adds the `hudi`/hudi-rs Python 
package), and a `spark4-hudi` `docker-compose` service on non-colliding ports 
with its own `data/spark4-events` mount.
   - `build.sh` gains a parallel `SPARK4_*` version block and a build step 
tagging `apachehudi/spark4-hudi`.
   
   **Notebook reorganization**
   - Notebooks split into `common/` (shared, baked into both images) and 
`spark3/` + `spark4/`, each with its own `utils.py` (differing only in default 
Spark/Scala/Hudi versions).
   - New Spark 4 hudi-rs example notebook — write with Spark, then query with 
the native `hudi-rs` reader (snapshot, partition-filter, time-travel, 
incremental).
   
   **Runtime / fixes**
   - Notebooks run against the in-container Spark standalone master with 4g 
driver/executor memory (in `spark-defaults.conf`); each service pins a 
deterministic `hostname`.
   - The Hudi bundle is added to driver + executor `extraClassPath` (resolved 
locally, downloaded if missing) to avoid a metadata-table `ClassCastException` 
on the standalone cluster.
   - `hoodie.write.table.version=6` added to the Presto example (commented in 
the Trino example) for query-engine compatibility, with explanatory notes.
   - Silenced the AWS SDK v1 deprecation banner in the Presto image; fixed a 
Spark 4 ANSI-mode string-concat error in the SCD notebook (`+` -> `concat_ws`).
   
   ### Impact
   
   No public API or production behavior change. Affects only the 
`hudi-notebooks` demo/dev environment (new image, notebooks, docs).
   
   ### Risk Level
   
   low
   
   Isolated to the `hudi-notebooks` demo module. Configs and notebooks were 
validated statically (shell syntax, `docker compose config`, notebook JSON / 
Python). The Docker images have not yet been built/run end-to-end.
   
   ### Documentation Update
   
   Updated `hudi-notebooks/README.md` and `hudi-notebooks/CLAUDE.md` for the 
new Spark 4 image, services, and notebook layout. No Hudi website/config 
changes.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to