jonvex commented on code in PR #11943:
URL: https://github.com/apache/hudi/pull/11943#discussion_r1811172853
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieWriterUtils.scala:
##########
@@ -167,9 +166,21 @@ object HoodieWriterUtils {
if (!isOverWriteMode) {
val resolver = spark.sessionState.conf.resolver
val diffConfigs = StringBuilder.newBuilder
+ val payloadIsExpressionPayload =
params.getOrElse(PAYLOAD_CLASS_NAME.key(),
"").equals("org.apache.spark.sql.hudi.command.payload.ExpressionPayload")
params.foreach { case (key, value) =>
+ var ignoreConfig = false
// Base file format can change between writes, so ignore it.
- if (!HoodieTableConfig.BASE_FILE_FORMAT.key.equals(key)) {
+ ignoreConfig = ignoreConfig ||
HoodieTableConfig.BASE_FILE_FORMAT.key.equals(key)
+
+ //expression payload will never be the table config so skip validation
of merge configs
+ ignoreConfig = ignoreConfig || (payloadIsExpressionPayload &&
(key.equals(PAYLOAD_CLASS_NAME.key())
+ || key.equals(HoodieTableConfig.PAYLOAD_CLASS_NAME.key()) ||
key.equals(RECORD_MERGE_MODE.key())
+ || key.equals(RECORD_MERGER_STRATEGY_ID.key())))
+
+ //don't validate the payload only in the case that insert into is
using fallback to some legacy configs
+ ignoreConfig = ignoreConfig || (key.equals(PAYLOAD_CLASS_NAME.key())
&&
value.equals("org.apache.spark.sql.hudi.command.ValidateDuplicateKeyPayload"))
Review Comment:
We need to ignore the validation on all of those keys. Because lets say you
are just using the default. You will have:
payload = default
merger strategy = default id
merge mode = event time
then when we do MIT the input configs will be
payload = expression payload
merger strategy = payload based strategy
merge mode = custom
So we don't want to validate all of those
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]