[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…

GitBox Wed, 19 Oct 2022 05:50:36 -0700


KnightChess commented on code in PR #6824:
URL: https://github.com/apache/hudi/pull/6824#discussion_r999401951



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: 
MergeIntoTable) extends Hoodie
 
       // column order changed after left anti join , we should keep column 
order of source dataframe
       val cols = removeMetaFields(sourceDF).columns
-      executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
parameters)
+      executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*), 
writeParam)

Review Comment:
   yes, use `hoodie.combine.before.insert` will de-duplicate, but this is not 
friendly to users.
   When create a table with precombine field and use merge into sql to upsert 
data, it may be prod duplicate records if user wirte diff merge sql. if user 
need solve it, we need set `hoodie.combine.before.insert` in one case which 
only has  no match branch. User will have doubt, a table with precombineKey in 
merge sql, sometime writing effect is `upsert` and sometime `insert`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] KnightChess commented on a diff in pull request #6824: [HUDI-4946] fix merge into with no preCombineField has dup row by onl…

Reply via email to