Re: [PR] [HUDI-9666] Fix the record key encoding with a single record key field and add a guard for complex key generator [hudi]

via GitHub Wed, 17 Sep 2025 09:05:17 -0700


danny0405 commented on code in PR #13650:
URL: https://github.com/apache/hudi/pull/13650#discussion_r2354559671



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkBaseIndexSupport.scala:
##########
@@ -179,9 +179,16 @@ abstract class SparkBaseIndexSupport(spark: SparkSession,
 
       // For tables with version < 9, single record key field, and using 
complex key generator,
       // avoid using the index due to ambiguity in record key encoding
+
+      // create a table version 8 with the old encoding and new encoding
+      // enable rli
+      // run a query with this skipping logic
+      // validate query result with data skipping enabled, before fix result 
will be wrong
+      // after fix it wont do pruning and return correct result.
+      // TestRecordLeveIndex class check those, and modify record level index 
tests to cover key encoding
+
       val tableVersion = metaClient.getTableConfig.getTableVersion
       val shouldSkipIndex = tableVersion.lesserThan(HoodieTableVersion.NINE) &&
-          fieldCount == 1 &&

Review Comment:
   nit: or we can have a `isComplexKeyGenerator` method to avoid parsing the 
record key fields number 2 times.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9666] Fix the record key encoding with a single record key field and add a guard for complex key generator [hudi]

Reply via email to