[GitHub] [hudi] danny0405 commented on a diff in pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

via GitHub Sat, 06 May 2023 02:50:31 -0700


danny0405 commented on code in PR #8107:
URL: https://github.com/apache/hudi/pull/8107#discussion_r1186672625



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/util/SparkKeyGenUtils.scala:
##########
@@ -41,14 +41,31 @@ object SparkKeyGenUtils {
    * @return partition columns
    */
   def getPartitionColumns(keyGenClass: String, typedProperties: 
TypedProperties): String = {
-
-    if (keyGenClass.equals(classOf[CustomKeyGenerator].getCanonicalName) || 
keyGenClass.equals(classOf[CustomAvroKeyGenerator])) {
+    // For CustomKeyGenerator and CustomAvroKeyGenerator, the partition path 
filed format
+    // is: "field_name: field_type", we extract the field_name from the 
partition path field.
+    if (keyGenClass.equals(classOf[CustomKeyGenerator].getCanonicalName) || 
keyGenClass.equals(classOf[CustomAvroKeyGenerator].getCanonicalName)) {
       
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
         .split(",").map(pathField => {
         pathField.split(CustomAvroKeyGenerator.SPLIT_REGEX)
           .headOption.getOrElse(s"Illegal partition path field format: 
'$pathField' for ${keyGenClass}")}).mkString(",")
     } else {
-      
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
+      
typedProperties.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), 
"")
+    }
+  }
+
+  def getPartitionColumns(keyGen: KeyGenerator, typedProperties: 
TypedProperties): String = {
+    keyGen match {
+      // For CustomKeyGenerator and CustomAvroKeyGenerator, the partition path 
filed format

Review Comment:
   Do we need this? Also fix the useless class imported.



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java:
##########
@@ -157,12 +155,6 @@ private String getPartitionPath(Option<GenericRecord> 
record, Option<Row> row, O
     return partitionPath.toString();
   }
 
-  private void validateRecordKeyFields() {
-    if (getRecordKeyFieldNames() == null || 
getRecordKeyFieldNames().isEmpty()) {
-      throw new HoodieKeyException("Unable to find field names for record key 
in cfg");
-    }

Review Comment:
   If we know the user set up `UPSERT` operation explicitly, we can keep this 
validation. Or maybe we just move the validation to the kengen factory class.



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/keygen/TestCustomKeyGenerator.java:
##########
@@ -240,62 +240,6 @@ public void testInvalidPartitionKeyType(TypedProperties 
props) {
     }
   }
 
-  @Test
-  public void testNoRecordKeyFieldPropWithKeyGeneratorClass() {
-    testNoRecordKeyFieldProp(true);
-  }
-
-  @Test
-  public void testNoRecordKeyFieldPropWithKeyGeneratorType() {
-    testNoRecordKeyFieldProp(false);
-  }
-
-  public void testNoRecordKeyFieldProp(boolean useKeyGeneratorClassName) {
-    TypedProperties propsWithoutRecordKeyFieldProps = 
getPropsWithoutRecordKeyFieldProps(useKeyGeneratorClassName);
-    try {
-      BuiltinKeyGenerator keyGenerator =
-          (BuiltinKeyGenerator) 
HoodieSparkKeyGeneratorFactory.createKeyGenerator(propsWithoutRecordKeyFieldProps);
-
-      keyGenerator.getKey(getRecord());
-      Assertions.fail("should fail when record key field is not provided!");

Review Comment:
   We can not remove all the non-key validation for the code, we still need to 
keep tests for `UPSERT` and null keys validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a diff in pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

Reply via email to