Kkkaneki-k commented on code in PR #6697:
URL: https://github.com/apache/paimon/pull/6697#discussion_r2633497378
##########
paimon-spark/paimon-spark-3.2/src/main/scala/org/apache/paimon/spark/PaimonBaseScanBuilder.scala:
##########
@@ -71,7 +77,28 @@ abstract class PaimonBaseScanBuilder
filter =>
val predicate = converter.convertIgnoreFailure(filter)
if (predicate == null) {
- postScan.append(filter)
+ val rowTypeWithRowId = new RowType(
+ false,
+ Collections.singletonList(new DataField(-1, ROW_ID.name(),
DataTypes.BIGINT())))
+ val converterWithRowId = new SparkFilterConverter(rowTypeWithRowId)
+ val newPredicate = converterWithRowId.convertIgnoreFailure(filter)
Review Comment:
> Can we pass the filter containing RowId into PaimonScanBuilder, and let
Paimon Core parse it and generate the corresponding range itself? It feels
redundant to reimplement this logic in every engine.
Thanks for your review! I've carefully considered your suggestion and think
that this change might introduce the following two problems:
1. When a filter containing `_ROW_ID` cannot be consumed, we need to return
it to the engine as a post-scan filter. This may be difficult to achieve if
Paimon Core itself consumes it and generates the corresponding range.
2. Currently, ReadBuilder requires separate inputs of filters containing
`_ROW_ID` and filters without `_ROW_ID` during the build process. This means we
need to differentiate between these two types of filters in the engine and
input them separately (unless we modify ReadBuilder to handle this
differentiation automatically during the build process).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]