Re: [PR] [spark] paimon-spark supports row id push down [paimon]

via GitHub Thu, 18 Dec 2025 19:40:24 -0800


Kkkaneki-k commented on code in PR #6697:
URL: https://github.com/apache/paimon/pull/6697#discussion_r2633497378



##########
paimon-spark/paimon-spark-3.2/src/main/scala/org/apache/paimon/spark/PaimonBaseScanBuilder.scala:
##########
@@ -71,7 +77,28 @@ abstract class PaimonBaseScanBuilder
       filter =>
         val predicate = converter.convertIgnoreFailure(filter)
         if (predicate == null) {
-          postScan.append(filter)
+          val rowTypeWithRowId = new RowType(
+            false,
+            Collections.singletonList(new DataField(-1, ROW_ID.name(), 
DataTypes.BIGINT())))
+          val converterWithRowId = new SparkFilterConverter(rowTypeWithRowId)
+          val newPredicate = converterWithRowId.convertIgnoreFailure(filter)

Review Comment:
   > Can we pass the filter containing RowId into PaimonScanBuilder, and let 
Paimon Core parse it and generate the corresponding range itself? It feels 
redundant to reimplement this logic in every engine.
   
   Thanks for your review! I've carefully considered your suggestion and think 
that this change might introduce the following two problems:
   1. When a filter containing `_ROW_ID` cannot be consumed, we need to return 
it to the engine as a post-scan filter. This may be difficult to achieve if 
Paimon Core itself consumes it and generates the corresponding range.
   2. Currently, ReadBuilder requires separate inputs of filters containing 
`_ROW_ID` and filters without `_ROW_ID` during the build process. This means we 
need to differentiate between these two types of filters in the engine and 
input them separately (unless we modify ReadBuilder to handle this 
differentiation automatically during the build process).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [spark] paimon-spark supports row id push down [paimon]

Reply via email to