[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

via GitHub Thu, 23 Mar 2023 09:37:38 -0700


wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1146470877



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -150,20 +150,7 @@ object HoodieSparkSqlWriter {
       case _ => throw new HoodieException("hoodie only support 
org.apache.spark.serializer.KryoSerializer as spark.serializer")
     }
     val tableType = HoodieTableType.valueOf(hoodieConfig.getString(TABLE_TYPE))
-    var operation = 
WriteOperationType.fromValue(hoodieConfig.getString(OPERATION))
-    // TODO clean up
-    // It does not make sense to allow upsert() operation if INSERT_DROP_DUPS 
is true
-    // Auto-correct the operation to "insert" if OPERATION is set to "upsert" 
wrongly
-    // or not set (in which case it will be set as "upsert" by 
parametersWithWriteDefaults()) .
-    if (hoodieConfig.getBoolean(INSERT_DROP_DUPS) &&

Review Comment:
   Since we enable `INSERT_DROP_DUPS`, all the incoming write data will be 
de-duplicated in both `UPSERT` and `INSERT` operations, I find it's a bit hard 
to distinguish these two operations through the written data in this case.
   
   We may get the operationType through the latest 
`$tbl_path/.hoodie/$timestamp.commit` file, do you think it's apporpriate? Any 
suggestions will be very appreciated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Reply via email to