rdblue commented on a change in pull request #3600:
URL: https://github.com/apache/iceberg/pull/3600#discussion_r759661579
##########
File path: core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
##########
@@ -429,13 +425,44 @@ private ManifestFile filterManifestWithDeletedFiles(
}
}
- private Evaluator strictDeleteEvaluator(PartitionSpec spec) {
- Expression strictExpr = Projections.strict(spec).project(deleteExpression);
- return new Evaluator(spec.partitionType(), strictExpr);
- }
+ // an evaluator that checks whether rows in a file may/must match a given
expression
+ // this class first partially evaluates the provided expression using the
partition tuple
+ // and then checks the remaining part of the expression using metrics
evaluators
+ private class ExpressionEvaluator {
+ private final Schema tableSchema;
+ private final ResidualEvaluator residualEvaluator;
+ private final StructLikeMap<Pair<InclusiveMetricsEvaluator,
StrictMetricsEvaluator>> metricsEvaluators;
+
+ // TODO: support case sensitive flags
+ ExpressionEvaluator(Schema tableSchema, PartitionSpec spec, Expression
expr) {
+ this.tableSchema = tableSchema;
+ this.residualEvaluator = ResidualEvaluator.of(spec, expr, true);
+ this.metricsEvaluators = StructLikeMap.create(spec.partitionType());
+ }
- private Evaluator inclusiveDeleteEvaluator(PartitionSpec spec) {
- Expression inclusiveExpr =
Projections.inclusive(spec).project(deleteExpression);
- return new Evaluator(spec.partitionType(), inclusiveExpr);
+ boolean rowsMightMatch(F file) {
+ Pair<InclusiveMetricsEvaluator, StrictMetricsEvaluator> evaluators =
metricsEvaluators(file);
+ InclusiveMetricsEvaluator inclusiveMetricsEvaluator = evaluators.first();
+ return inclusiveMetricsEvaluator.eval(file);
+ }
+
+ boolean rowsMustMatch(F file) {
+ Pair<InclusiveMetricsEvaluator, StrictMetricsEvaluator> evaluators =
metricsEvaluators(file);
+ StrictMetricsEvaluator strictMetricsEvaluator = evaluators.second();
+ return strictMetricsEvaluator.eval(file);
+ }
+
+ private Pair<InclusiveMetricsEvaluator, StrictMetricsEvaluator>
metricsEvaluators(F file) {
+ // this logic depends on ResidualEvaluator that behaves in the following
way
+ // if strict projection returns true -> the pred would return true ->
replace the pred with true
+ // if inclusive projection returns false -> the pred would return false
-> replace the pred with false
+ // otherwise, keep the original predicate and try evaluating it using
metrics
Review comment:
I'm not sure that I agree with this. The residual evaluator does this
for every predicate in the expression. It effectively removes any predicate
that is determined by the strict evaluator and returns the part of the
predicate that needs to be evaluated for rows in the given partition.
For example, `id = 5 AND ts > '2021-11-30T10:00:00'` for partition
`(id_bucket=0, ts_day='2021-12-01')` will return `id = 5` because the entire
partition matches `ts > 2021-11-30T10:00:00`.
I think the actual logic here is correct. It only uses the predicates that
are not satisfied by the partition itself. Since the residual evaluator handles
transform expressions and those are usually determined by the partition tuple,
those are effectively removed and you can just use metrics. This is a really
good insight.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]