rahil-c opened a new pull request, #19049: URL: https://github.com/apache/hudi/pull/19049
## What Ports and finalizes #18650 (by @prashantwason, follow-up review commit by @nsivabalan) onto current `master`. Closes #18649. The read side of `DefaultSource` collects `hoodie.*` and `spark.hoodie.*` from `sqlContext.getAllConfs` and normalizes them; the write side did not, so configs set via `--conf spark.hoodie.X=Y` were silently dropped on writes (e.g. `hoodie.datasource.hive_sync.use_spark_catalog`). This brings the write path to parity with the read path. Two helpers in `DataSourceOptionsHelper`: - `collectHoodieAndSparkHoodieConfs(sqlContext, optParams)` — pulls `hoodie.*` and `spark.hoodie.*` from SparkConf, normalizes `spark.hoodie.*` → canonical `hoodie.*`, merges with explicit options (explicit options win). - `normalizeSparkHoodiePrefix(parameters)` — strips `spark.` from `spark.hoodie.*` keys; canonical `hoodie.*` wins on conflict; idempotent. Wired symmetrically into read/write `createRelation`, `parametersWithReadDefaults`, and `parametersWithWriteDefaults`. The `parametersWithWriteDefaults` normalization also extends parity to the SQL `ALTER TABLE` and CLI call sites. ## Provenance - Commit 1 — original fix (Prashant Wason). - Commit 2 — review follow-up: normalize at collection time + helper tests (sivabalan). - Commit 3 — *added during port*: addressed the inline review nit by renaming the `normalizeSparkHoodiePrefix` local (`normalized` → `rekeyedSparkHoodie`) for clarity. No behavior change. ## Verification - Deep-reviewed the full change: precedence is preserved across read and write paths (explicit options > `hoodie.*` > `spark.hoodie.*` > global props), `normalizeSparkHoodiePrefix` is a safe no-op for maps without the prefix, comments are accurate, and the added tests assert real values + the precedence/idempotence contracts (not just counts). - Build-verified: `mvn install -pl hudi-spark-datasource/hudi-spark-common -am -DskipTests -Dspark3.5 -Dscala-2.12` → BUILD SUCCESS (test sources compile). Checkstyle clean. - Tests not run locally here; the original PR's last Azure run showed a failure that should be re-checked on this rebased branch. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
