YannByron commented on code in PR #7200:
URL: https://github.com/apache/paimon/pull/7200#discussion_r2764032625


##########
paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/rowops/PaimonCopyOnWriteScan.scala:
##########
@@ -51,10 +54,28 @@ case class PaimonCopyOnWriteScan(
   extends BaseScan
   with SupportsRuntimeV2Filtering {
 
-  override def inputSplits: Array[Split] = 
dataSplits.asInstanceOf[Array[Split]]
+  // Track whether filter() has been called
+  @volatile private var filterApplied: Boolean = false
+
+  override def inputSplits: Array[Split] = {
+    if (!filterApplied) {
+      // If filter() was never called, it means Spark decided not to use 
runtime filtering
+      // (e.g., FULL OUTER JOIN for NOT MATCHED BY SOURCE). Read all data.
+      loadAllSplits()
+    }
+    dataSplits.asInstanceOf[Array[Split]]
+  }
 
   var dataSplits: Array[DataSplit] = Array()
 
+  private def loadAllSplits(): Unit = {

Review Comment:
   Extract the initialization  of `snapshotReader` from `loadAllSplits` and 
`filter` methods, treat it as a member variable, and then define `dataSplits` 
as a lazy variable. The `filter` method just applies 
`SnapshotReader#withDataFileNameFilter` if it's called.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to