yihua commented on code in PR #13650:
URL: https://github.com/apache/hudi/pull/13650#discussion_r2300175200
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/SparkBaseIndexSupport.scala:
##########
@@ -173,18 +188,14 @@ abstract class SparkBaseIndexSupport(spark: SparkSession,
var recordKeyQueries: List[Expression] = List.empty
var compositeRecordKeys: List[String] = List.empty
val recordKeyOpt = getRecordKeyConfig
-
val isComplexRecordKey = {
+ val keyGeneratorClassName =
metaClient.getTableConfig.getKeyGeneratorClassName
val fieldCount = recordKeyOpt.map(recordKeys =>
recordKeys.length).getOrElse(0)
- val encodeFieldNameConfig =
metaClient.getTableConfig.getProps.getProperty(
-
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.key(),
-
org.apache.hudi.config.HoodieWriteConfig.COMPLEX_KEYGEN_ENCODE_SINGLE_RECORD_KEY_FIELD_NAME.defaultValue().toString
- ).toBoolean
-
+ val isUsingComplexKeyGen = isComplexKeyGenerator(keyGeneratorClassName)
// Consider as complex if:
// 1. Multiple fields (> 1), OR
- // 2. Single field with complex keygen encoding enabled
- (fieldCount > 1) || (fieldCount == 1 && encodeFieldNameConfig)
+ // 2. Using complex key generator with single field
+ fieldCount > 1 || (isUsingComplexKeyGen && fieldCount == 1)
Review Comment:
I agree that if there is uniform criteria for record key encoding that'd be
cleaner; but the reality is that there is no enforcement on the key generator
used wrt the number of record key fields and partition path fields. So the
uniform criteria cannot be used because of such possible case of complex keygen
with single record key field and multiple partition path fields. Even with
single record key field and multiple partition path fields, Complex and Custom
key generators have different record key encoding. So it would be safer to
just point fix complex keygen
We discussed that the uniform criteria should be honored in a follow-up. To
start with, in the key generation website page, we need to instruct user to not
specify key generator type or class (since the keygen implementation class is
inferred in our code), and only specify if necessary. And we need to disallow
single record key field and single partition path field with complex key
generator. We'll also need to make the key generation behavior consistent
across key generator classes, and between Spark and Flink (not that Flink does
not support CUSTOM keygen).
For now, we limit the scope of this PR to fix complex key generator with
single record key field only.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]