umehrot2 commented on a change in pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#discussion_r604479749



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala
##########
@@ -81,4 +81,33 @@ object HoodieWriterUtils {
     params.foreach(kv => props.setProperty(kv._1, kv._2))
     props
   }
+
+  /**
+   * Get the partition columns to stored to hoodie.properties.
+   * Return the partitionColumns only if it is the key generator class is the 
build-ins.
+   * For other custom key generator class, we cannot know whether or not it 
has relation
+   * with the partition columns.
+   * @param parameters
+   * @return
+   */
+  def getPartitionColumns(parameters: Map[String, String]): Option[String] = {
+    val  keyGenClass = parameters.getOrElse(KEYGENERATOR_CLASS_OPT_KEY,
+      DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+    val partitionColumns = parameters.get(PARTITIONPATH_FIELD_OPT_KEY)
+    if (keyGenClass == classOf[SimpleKeyGenerator].getName ||
+        keyGenClass == classOf[ComplexKeyGenerator].getName ||
+        keyGenClass == classOf[TimestampBasedKeyGenerator].getName) {

Review comment:
       I don't think this condition covers all cases. `CustomKeyGenerator` and 
`CustomAvroKeyGenerator` also make use of the partition path field. Also 
tracking these individually is risky and in future developers will easily miss 
this. Here is my recommendation:
   - Pass the KeyGenerator created in `HoodieSparkSqlWriter` to this function
   - Check if `KeyGenerator` is an instance of `BaseKeyGenerator`
   - If yes, invoke the `getPartitionPathFields` on the key generator and 
return those as the partition columns instead of reading it from the 
`PARTITIONPATH_FIELD_OPT_KEY`.
   - This will cover cases where someone extends from `BaseKeyGenerator` and 
uses some other partition key.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to