danny0405 commented on code in PR #8107:
URL: https://github.com/apache/hudi/pull/8107#discussion_r1186672625
##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/SparkKeyGenUtils.scala:
##########
@@ -41,14 +41,31 @@ object SparkKeyGenUtils {
* @return partition columns
*/
def getPartitionColumns(keyGenClass: String, typedProperties:
TypedProperties): String = {
-
- if (keyGenClass.equals(classOf[CustomKeyGenerator].getCanonicalName) ||
keyGenClass.equals(classOf[CustomAvroKeyGenerator])) {
+ // For CustomKeyGenerator and CustomAvroKeyGenerator, the partition path
filed format
+ // is: "field_name: field_type", we extract the field_name from the
partition path field.
+ if (keyGenClass.equals(classOf[CustomKeyGenerator].getCanonicalName) ||
keyGenClass.equals(classOf[CustomAvroKeyGenerator].getCanonicalName)) {
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
.split(",").map(pathField => {
pathField.split(CustomAvroKeyGenerator.SPLIT_REGEX)
.headOption.getOrElse(s"Illegal partition path field format:
'$pathField' for ${keyGenClass}")}).mkString(",")
} else {
-
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
+
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(),
"")
+ }
+ }
+
+ def getPartitionColumns(keyGen: KeyGenerator, typedProperties:
TypedProperties): String = {
+ keyGen match {
+ // For CustomKeyGenerator and CustomAvroKeyGenerator, the partition path
filed format
Review Comment:
Do we need this? Also fix the useless class imported.
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java:
##########
@@ -157,12 +155,6 @@ private String getPartitionPath(Option<GenericRecord>
record, Option<Row> row, O
return partitionPath.toString();
}
- private void validateRecordKeyFields() {
- if (getRecordKeyFieldNames() == null ||
getRecordKeyFieldNames().isEmpty()) {
- throw new HoodieKeyException("Unable to find field names for record key
in cfg");
- }
Review Comment:
If we know the user set up `UPSERT` operation explicitly, we can keep this
validation. Or maybe we just move the validation to the kengen factory class.
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java:
##########
@@ -240,62 +240,6 @@ public void testInvalidPartitionKeyType(TypedProperties
props) {
}
}
- @Test
- public void testNoRecordKeyFieldPropWithKeyGeneratorClass() {
- testNoRecordKeyFieldProp(true);
- }
-
- @Test
- public void testNoRecordKeyFieldPropWithKeyGeneratorType() {
- testNoRecordKeyFieldProp(false);
- }
-
- public void testNoRecordKeyFieldProp(boolean useKeyGeneratorClassName) {
- TypedProperties propsWithoutRecordKeyFieldProps =
getPropsWithoutRecordKeyFieldProps(useKeyGeneratorClassName);
- try {
- BuiltinKeyGenerator keyGenerator =
- (BuiltinKeyGenerator)
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propsWithoutRecordKeyFieldProps);
-
- keyGenerator.getKey(getRecord());
- Assertions.fail("should fail when record key field is not provided!");
Review Comment:
We can not remove all the non-key validation for the code, we still need to
keep tests for `UPSERT` and null keys validation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]