nsivabalan commented on code in PR #18650:
URL: https://github.com/apache/hudi/pull/18650#discussion_r3462211181
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala:
##########
@@ -1033,9 +1034,87 @@ object DataSourceOptionsHelper {
private val log = LoggerFactory.getLogger(DataSourceOptionsHelper.getClass)
// Prefix constants for config normalization
+ private val HOODIE_PREFIX = "hoodie."
private val SPARK_HOODIE_PREFIX = "spark.hoodie."
private val SPARK_PREFIX = "spark."
+ /**
+ * Collects `hoodie.*` and `spark.hoodie.*` configs from the SparkConf,
normalizes the
+ * `spark.hoodie.*` keys to canonical `hoodie.*`, and merges with explicit
DataFrame
+ * options. Explicit options win over SparkConf.
+ *
+ * This is the read-path entry point: reads have always picked up
session-level `hoodie.*`
+ * confs (e.g. `hoodie.datasource.query.type`), so both prefixes are
forwarded here.
+ * Do NOT use this for writes — see `collectSparkHoodieConfs` for why
ambient `hoodie.*`
+ * confs must not be forwarded to the write path.
+ *
+ * Example (SparkConf has both prefixes set; explicit options override):
+ * {{{
+ * SparkConf: spark.hoodie.X = "a", hoodie.Y = "b"
+ * optParams: hoodie.X = "c"
+ * result: hoodie.X = "c" // explicit wins over both prefixes
+ * hoodie.Y = "b"
+ * }}}
+ */
+ def collectHoodieAndSparkHoodieConfs(sqlContext: SQLContext,
Review Comment:
Can we change the naming of the method `collectHoodieAndSparkHoodieConfs` to
differentiate reads and writes.
as of now, we rely on documentation for devs to not make un intended changes
in future. i.e to call the right method when refactoring or fixing bugs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]