KnightChess commented on code in PR #6824:
URL: https://github.com/apache/hudi/pull/6824#discussion_r999401951
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/MergeIntoHoodieTableCommand.scala:
##########
@@ -160,7 +167,7 @@ case class MergeIntoHoodieTableCommand(mergeInto:
MergeIntoTable) extends Hoodie
// column order changed after left anti join , we should keep column
order of source dataframe
val cols = removeMetaFields(sourceDF).columns
- executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*),
parameters)
+ executeInsertOnly(insertSourceDF.select(cols.head, cols.tail:_*),
writeParam)
Review Comment:
yes, use `hoodie.combine.before.insert` will de-duplicate, but this is not
friendly to users.
When create a table with precombine field and use merge into sql to upsert
data, it may be prod duplicate records if user wirte diff merge sql. if user
need solve it, we need set `hoodie.combine.before.insert` in one case which
only has no match branch. User will have doubt, a table with precombineKey in
merge sql, sometime writing effect is `upsert` and sometime `insert`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]