Re: [PR] [spark] Support MergeInto on Paimon append-only table in spark V2 write [paimon]

via GitHub Wed, 04 Feb 2026 05:32:34 -0800


YannByron commented on code in PR #7200:
URL: https://github.com/apache/paimon/pull/7200#discussion_r2764032625



##########
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/rowops/PaimonCopyOnWriteScan.scala:
##########
@@ -51,10 +54,28 @@ case class PaimonCopyOnWriteScan(
   extends BaseScan
   with SupportsRuntimeV2Filtering {
 
-  override def inputSplits: Array[Split] = 
dataSplits.asInstanceOf[Array[Split]]
+  // Track whether filter() has been called
+  @volatile private var filterApplied: Boolean = false
+
+  override def inputSplits: Array[Split] = {
+    if (!filterApplied) {
+      // If filter() was never called, it means Spark decided not to use 
runtime filtering
+      // (e.g., FULL OUTER JOIN for NOT MATCHED BY SOURCE). Read all data.
+      loadAllSplits()
+    }
+    dataSplits.asInstanceOf[Array[Split]]
+  }
 
   var dataSplits: Array[DataSplit] = Array()
 
+  private def loadAllSplits(): Unit = {

Review Comment:
   Extract the initialization  of `snapshotReader` from `loadAllSplits` and 
`filter` methods, treat it as a member variable, and then define `dataSplits` 
as a lazy variable. The `filter` method just applies 
`SnapshotReader#withDataFileNameFilter` if it's called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [spark] Support MergeInto on Paimon append-only table in spark V2 write [paimon]

Reply via email to