yihua commented on code in PR #13650:
URL: https://github.com/apache/hudi/pull/13650#discussion_r2357110885
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -258,18 +258,19 @@ public class HoodieWriteConfig extends HoodieConfig {
"**Note** This is being actively worked on. Please use "
+ "`hoodie.datasource.write.keygenerator.class` instead.");
- public static final ConfigProperty<Boolean> COMPLEX_KEYGEN_OLD_ENCODING =
ConfigProperty
- .key("hoodie.write.complex.keygen.old.encoding")
- .defaultValue(true)
+ // keep both configs for table version 8 and below
+ // change naming to be new and then flip default to false
+ public static final ConfigProperty<Boolean> COMPLEX_KEYGEN_NEW_ENCODING =
ConfigProperty
+ .key("hoodie.write.complex.keygen.new.encoding")
+ .defaultValue(false)
.markAdvanced()
- .sinceVersion("1.1.0")
Review Comment:
Taking back my previous comment, I think we still need to mark this with
`.sinceVersion("1.1.0")` since it is used for writing table version 8 and below.
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java:
##########
@@ -238,20 +238,39 @@ private <S> S combineCompositeRecordKeyInternal(
Predicate<S> isNullOrEmptyKeyPartPredicate,
Object... recordKeyParts
) {
- boolean hasNonNullNonEmptyPart = false;
+ if (recordKeyParts.length == 0) {
+ throw new HoodieKeyException(String.format("All of the values for (%s)
were either null or empty", recordKeyFields));
+ }
PartitionPathFormatterBase.StringBuilder<S> sb = builderFactory.get();
- for (int i = 0; i < recordKeyParts.length; ++i) {
+
+ if (recordKeyParts.length == 1) {
// NOTE: If record-key part has already been a string [[toString]] will
be a no-op
- S convertedKeyPart =
emptyKeyPartHandler.apply(converter.apply(recordKeyParts[i]));
+ S convertedKeyPart =
emptyKeyPartHandler.apply(converter.apply(recordKeyParts[0]));
- if (encodeSingleKeyFieldName || recordKeyParts.length > 1) {
- sb.appendJava(recordKeyFields.get(i));
+ if (encodeSingleKeyFieldName) {
+ sb.appendJava(recordKeyFields.get(0));
sb.appendJava(DEFAULT_COLUMN_VALUE_SEPARATOR);
}
sb.append(convertedKeyPart);
// This check is to validate that overall composite-key has at least one
non-null, non-empty
// segment
+ if (isNullOrEmptyKeyPartPredicate.test(convertedKeyPart)) {
+ throw new HoodieKeyException(String.format("All of the values for (%s)
were either null or empty", recordKeyFields));
+ }
+ return sb.build();
+ }
+
+ boolean hasNonNullNonEmptyPart = false;
+ for (int i = 0; i < recordKeyParts.length; ++i) {
+ // NOTE: If record-key part has already been a string [[toString]] will
be a no-op
+ S convertedKeyPart =
emptyKeyPartHandler.apply(converter.apply(recordKeyParts[i]));
+
+ sb.appendJava(recordKeyFields.get(i));
+ sb.appendJava(DEFAULT_COLUMN_VALUE_SEPARATOR);
+ sb.append(convertedKeyPart);
+ // This check is to validate that overall composite-key has at least one
non-null, non-empty
+ // segment
Review Comment:
Can this logic be simplified with just the change of `if
(encodeSingleKeyFieldName || recordKeyParts.length > 1) {`?
##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestComplexKeyGenerator.java:
##########
@@ -178,15 +180,16 @@ void testMultipleValueKeyGenerator(boolean
setEncodeSingleKeyFieldNameConfig,
@ParameterizedTest
@CsvSource(value = {"false,true", "true,false", "true,true"})
- void testMultipleValueKeyGeneratorNonPartitioned(boolean
setEncodeSingleKeyFieldNameConfig,
- boolean
encodeSingleKeyFieldName) {
+ void testMultipleValueKeyGeneratorNonPartitioned(boolean
setNewEncodingConfig,
+ boolean
encodeSingleKeyFieldValueOnly) {
TypedProperties properties = new TypedProperties();
properties.setProperty(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(),
"_row_key,timestamp");
properties.setProperty(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(),
"");
- if (setEncodeSingleKeyFieldNameConfig) {
+ properties.setProperty(HoodieWriteConfig.WRITE_TABLE_VERSION.key(), "8");
Review Comment:
For these tests let's also add table version 9 and validate that the
`COMPLEX_KEYGEN_NEW_ENCODING` config should not take effect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]