umehrot2 commented on a change in pull request #2651:
URL: https://github.com/apache/hudi/pull/2651#discussion_r604479749
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala
##########
@@ -81,4 +81,33 @@ object HoodieWriterUtils {
params.foreach(kv => props.setProperty(kv._1, kv._2))
props
}
+
+ /**
+ * Get the partition columns to stored to hoodie.properties.
+ * Return the partitionColumns only if it is the key generator class is the
build-ins.
+ * For other custom key generator class, we cannot know whether or not it
has relation
+ * with the partition columns.
+ * @param parameters
+ * @return
+ */
+ def getPartitionColumns(parameters: Map[String, String]): Option[String] = {
+ val keyGenClass = parameters.getOrElse(KEYGENERATOR_CLASS_OPT_KEY,
+ DEFAULT_KEYGENERATOR_CLASS_OPT_VAL)
+ val partitionColumns = parameters.get(PARTITIONPATH_FIELD_OPT_KEY)
+ if (keyGenClass == classOf[SimpleKeyGenerator].getName ||
+ keyGenClass == classOf[ComplexKeyGenerator].getName ||
+ keyGenClass == classOf[TimestampBasedKeyGenerator].getName) {
Review comment:
I don't think this condition covers all cases. `CustomKeyGenerator` and
`CustomAvroKeyGenerator` also make use of the partition path field. Also
tracking these individually is risky and in future developers will easily miss
this. Here is my recommendation:
- Pass the KeyGenerator created in `HoodieSparkSqlWriter` to this function
- Check if `KeyGenerator` is an instance of `BaseKeyGenerator`
- If yes, invoke the `getPartitionPathFields` on the key generator and
return those as the partition columns instead of reading it from the
`PARTITIONPATH_FIELD_OPT_KEY`.
- This will cover cases where someone extends from `BaseKeyGenerator` and
uses some other partition key.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]