Re: [PR] [HUDI-9666] Fix the record key encoding with a single record key field and add a guard for complex key generator [hudi]

via GitHub Tue, 26 Aug 2025 01:10:07 -0700


yihua commented on code in PR #13650:
URL: https://github.com/apache/hudi/pull/13650#discussion_r2300175200



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkBaseIndexSupport.scala:
##########
@@ -173,18 +188,14 @@ abstract class SparkBaseIndexSupport(spark: SparkSession,
       var recordKeyQueries: List[Expression] = List.empty
       var compositeRecordKeys: List[String] = List.empty
       val recordKeyOpt = getRecordKeyConfig
-      
       val isComplexRecordKey = {
+        val keyGeneratorClassName = 
metaClient.getTableConfig.getKeyGeneratorClassName
         val fieldCount = recordKeyOpt.map(recordKeys => 
recordKeys.length).getOrElse(0)
-        val encodeFieldNameConfig = 
metaClient.getTableConfig.getProps.getProperty(
-          
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.key(),
 
-          
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.defaultValue().toString
-        ).toBoolean
-        
+        val isUsingComplexKeyGen = isComplexKeyGenerator(keyGeneratorClassName)
         // Consider as complex if:
         // 1. Multiple fields (> 1), OR
-        // 2. Single field with complex keygen encoding enabled
-        (fieldCount > 1) || (fieldCount == 1 && encodeFieldNameConfig)
+        // 2. Using complex key generator with single field
+        fieldCount > 1 || (isUsingComplexKeyGen && fieldCount == 1)

Review Comment:
   I agree that if there is uniform criteria for record key encoding that'd be 
cleaner; but the reality is that there is no enforcement on the key generator 
used wrt the number of record key fields and partition path fields.  So the 
uniform criteria cannot be used because of such possible case of complex keygen 
with single record key field and multiple partition path fields.  Even with 
single record key field and multiple partition path fields, Complex and Custom 
key generators have different record key encoding.  So it would be safer to 
just point fix complex keygen
   
   We discussed that the uniform criteria should be honored in a follow-up.  To 
start with, in the key generation website page, we need to instruct user to not 
specify key generator type or class (since the keygen implementation class is 
inferred in our code), and only specify if necessary. And we need to disallow 
single record key field and single partition path field with complex key 
generator.  We'll also need to make the key generation behavior consistent 
across key generator classes, and between Spark and Flink (not that Flink does 
not support CUSTOM keygen).
   
   For now, we limit the scope of this PR to fix complex key generator with 
single record key field only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9666] Fix the record key encoding with a single record key field and add a guard for complex key generator [hudi]

Reply via email to