JingsongLi commented on code in PR #8081:
URL: https://github.com/apache/paimon/pull/8081#discussion_r3354689325


##########
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/commands/MergeIntoPaimonDataEvolutionTable.scala:
##########
@@ -189,30 +189,53 @@ case class MergeIntoPaimonDataEvolutionTable(
       map.toMap
     }
 
-    // step 1: find the related data splits, make it target file plan
-    val dataSplits: Seq[DataSplit] =
-      targetRelatedSplits(sparkSession, tableSplits, firstRowIds, 
firstRowIdToBlobFirstRowIds)
-    val touchedFileTargetRelation =
-      createNewScanPlan(dataSplits, targetRelation)
-
-    // step 2: invoke update action
-    val updateCommit =
-      if (matchedActions.nonEmpty) {
-        val updateResult =
-          updateActionInvoke(dataSplits, sparkSession, 
touchedFileTargetRelation, firstRowIds)
-        checkUpdateResult(updateResult)
-      } else Nil
-
-    // step 3: invoke insert action
-    val insertCommit =
-      if (notMatchedActions.nonEmpty)
-        insertActionInvoke(sparkSession, touchedFileTargetRelation)
-      else Nil
-
-    if (plan.snapshotId() != null) {
-      writer.rowIdCheckConflict(plan.snapshotId())
+    val persistSourceDss: Option[Dataset[Row]] =
+      if (table.coreOptions().dataEvolutionMergeIntoSourcePersist() && 
matchedActions.nonEmpty) {

Review Comment:
   This guard means the new option has no effect for insert-only MERGE 
statements (`WHEN NOT MATCHED` without any matched update). That path can still 
load the source twice when file pruning is enabled: `targetRelatedSplits` 
builds `sourceDss` to find touched splits, and `insertActionInvoke` then builds 
the left-anti join from `sourceTable` again. Since this option is described as 
persisting the source for merge-into, could we enable it whenever the source 
may be reused, e.g. `table.coreOptions().dataEvolutionMergeIntoSourcePersist() 
&& (matchedActions.nonEmpty || notMatchedActions.nonEmpty)` or otherwise 
document that it is intentionally update-only?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to