wombatu-kun opened a new pull request, #16362:
URL: https://github.com/apache/iceberg/pull/16362

   ## What
   
   Resolves the long-standing `TODO: translate truncate(col) == value to 
startsWith(value)` in `UnboundPredicate.bindLiteralOperation`. When the term of 
an `EQ`/`NOT_EQ` predicate is a **string** `truncate[W]` transform, binding now 
produces an exactly-equivalent predicate on the untransformed source column.
   
   The equivalence depends on the literal length vs. the truncate width `W`:
   
   | Condition | `truncate[W](col) == v` | `truncate[W](col) != v` |
   |---|---|---|
   | `len(v) > W` | `alwaysFalse()` | `alwaysTrue()` |
   | `len(v) == W` | `col STARTS_WITH v` | `col NOT_STARTS_WITH v` |
   | `len(v) < W` | `col == v` | `col != v` |
   
   Integer/long/decimal/binary truncate and all other operators are 
intentionally left unchanged — they have no exact source-column equivalence.
   
   ## Why
   
   This rewrite is already assumed by the rest of the engine. 
`InclusiveMetricsEvaluator.startsWith()` returns `ROWS_MIGHT_MATCH` for 
non-identity transform terms with the explicit comment *"truncate must be 
rewritten in binding"*. Until now the binder never performed that rewrite for 
equality, so `truncate(col) == v` kept an opaque `BoundTransform` term and 
metrics/dictionary/partition pruning could not use the column. After this 
change such predicates prune correctly (e.g. `equal(truncate("str",3),"xyz")` 
against bounds `["abc","abe"]` now skips the file instead of reading it).
   
   ## Implementation notes
   
   - The `< / == / > width` decision is centralized in a single 
`Truncate.lengthRewrite` helper, shared by predicate binding and by 
`TruncateString.project` / `projectStrict`, so the two paths cannot diverge.
   - Removing the `BoundTransform` term would otherwise defeat 
`ProjectionUtil.projectTransformPredicate` (which matches partition transforms 
by `toString()`) and collapse strict projection to `False`. To preserve the 
previous precision, `TruncateString` now projects EXACT-length (`len(v) < W`) 
`EQ`/`NOT_EQ` predicates directly onto the partition value — provably the same 
result the old transform-term path produced, since `truncate[W](x) == v ⟺ x == 
v` when `len(v) < W`.
   - New public API is additive only (`Transforms.StringTruncateRewrite` enum + 
`Transforms.stringTruncateRewrite`); `revapi` passes with no accepted-breaks 
entry.
   
   ## Testing
   
   - `TestPredicateBinding`: all three length classes for `EQ`/`NOT_EQ`, 
empty-string literal, plus negatives (non-string truncate, non-truncate 
transforms, other operators unchanged).
   - `TestStartsWith`: runtime-equivalence check via `Evaluator`, including the 
EXACT-vs-prefix distinction.
   - `TestInclusiveMetricsEvaluatorWithTransforms`: a pruning case that now 
prunes where it previously could not.
   - Full `:iceberg-api:test` and `:iceberg-core:test`, all 
transform/projection/residual regression suites, `spotlessCheck`, and 
`:iceberg-api:revapi` pass.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to