Re: [PR] feat: (HUDI-9714) Disallow changing merge related properties for Spark datasource [hudi]

via GitHub Sat, 18 Oct 2025 04:35:10 -0700


linliu-code commented on code in PR #13959:
URL: https://github.com/apache/hudi/pull/13959#discussion_r2377189006



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:
##########
@@ -217,28 +218,28 @@ object HoodieWriterUtils {
   }
 
   /**
-   * This function adds specific rules to choose the right config key for 
payload class for version 9 tables.
-   *
-   * RULE 1: When
-   *   1. table version is 9,
-   *   2. writer key is a payload class key, and
-   *   3. table config has legacy payload class configured,
-   * then
-   *   return legacy payload class key.
-   *
-   * Basic rule:
-   *   return writer key.
+   * For table version >= 9, this function finds the corresponding key in 
`HoodieTableConfig`
+   * for a key in `HoodieWriteConfig`, including configs related to payload 
class, record merge mode, merge strategy id.
    */
-  def getPayloadClassConfigKeyFromTableConfig(key: String, tableConfig: 
HoodieConfig): String = {
-    if (tableConfig == null) {
+  def getKeyInTableConfig(key: String, tableConfig: HoodieConfig): String = {
+    if (tableConfig == null || tableConfig.getInt(HoodieTableConfig.VERSION) < 
HoodieTableVersion.NINE.versionCode()) {

Review Comment:
   Tried. very verbose: 
HoodieTableVersion.fromVersionCode(tableConfig.getInt(HoodieTableConfig.VERSION)).lesserThan(HoodieTableVersion.NINE).
   Let us keep this way, it is clear already.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestPartialUpdateForMergeInto.scala:
##########
@@ -681,9 +681,7 @@ class TestPartialUpdateForMergeInto extends 
HoodieSparkSqlTestBase {
   test("Test MergeInto Partial Updates should fail with CUSTOM payload and 
merge mode") {
     withTempDir { tmp =>
       withSQLConf(
-        "hoodie.index.type" -> "GLOBAL_SIMPLE",
-        "hoodie.write.record.merge.mode" -> "CUSTOM",
-        "hoodie.datasource.write.payload.class" -> 
"org.apache.hudi.common.testutils.reader.HoodieRecordTestPayload") {

Review Comment:
   Using original setting, after the `create_table` statement, its table 
properties are as follows. We can see the merge mode, payload class, etc. are 
not expected. Therefore, after adding the merge mode check, this query failed. 
Why previously it works? I think these `runtime` merge configs worked during 
the write/read by overwriting the table properties, which is what we want to 
forbid.
   
   > #Properties saved on 2025-09-24T22:14:33.394Z
   #Wed Sep 24 22:14:33 UTC 2025
   hoodie.table.format=native
   hoodie.table.version=9
   hoodie.database.name=default
   hoodie.table.initial.version=9
   hoodie.datasource.write.hive_style_partitioning=true
   hoodie.table.keygenerator.type=SIMPLE
   
hoodie.table.create.schema={"type"\:"record","name"\:"h1_record","namespace"\:"hoodie.h1","fields"\:[{"name"\:"record_key","type"\:["string","null"]},{"name"\:"name","type"\:["string","null"]},{"name"\:"age","type"\:["int","null"]},{"name"\:"salary","type"\:["double","null"]},{"name"\:"ts","type"\:["long","null"]},{"name"\:"department","type"\:["string","null"]}]}
   hoodie.archivelog.folder=archived
   hoodie.table.name=h1
   hoodie.record.merge.strategy.id=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5
   hoodie.timeline.history.path=history
   hoodie.table.type=MERGE_ON_READ
   hoodie.datasource.write.partitionpath.urlencode=false
   hoodie.datasource.write.drop.partition.columns=false
   hoodie.timeline.layout.version=2
   hoodie.record.merge.mode=EVENT_TIME_ORDERING
   hoodie.table.partition.fields=department
   hoodie.table.recordkey.fields=record_key
   hoodie.timeline.path=timeline
   hoodie.table.ordering.fields=ts
   hoodie.table.checksum=997088624



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestPartialUpdateAvroPayload.scala:
##########
@@ -122,8 +122,6 @@ class TestPartialUpdateAvroPayload extends 
HoodieClientTestBase {
       .option(HoodieTableConfig.ORDERING_FIELDS.key, "ts")
       .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
       .option(HoodieWriteConfig.TBL_NAME.key, "hoodie_test")
-      .option(HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key, 
"org.apache.hudi.common.model.PartialUpdateAvroPayload")
-      .option(HoodieWriteConfig.RECORD_MERGE_MODE.key(), 
RecordMergeMode.CUSTOM.name())

Review Comment:
   Previously we don't do the check; after we add the check, this test could 
fail. Therefore, I remove these configs.



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestPartialUpdateForMergeInto.scala:
##########
@@ -681,9 +681,7 @@ class TestPartialUpdateForMergeInto extends 
HoodieSparkSqlTestBase {
   test("Test MergeInto Partial Updates should fail with CUSTOM payload and 
merge mode") {
     withTempDir { tmp =>
       withSQLConf(
-        "hoodie.index.type" -> "GLOBAL_SIMPLE",
-        "hoodie.write.record.merge.mode" -> "CUSTOM",
-        "hoodie.datasource.write.payload.class" -> 
"org.apache.hudi.common.testutils.reader.HoodieRecordTestPayload") {

Review Comment:
   During the test it does not work that way. I guess it could be some gaps. 
Not sure if previous test really worked as expected. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: (HUDI-9714) Disallow changing merge related properties for Spark datasource [hudi]

Reply via email to