bharos opened a new pull request, #15883:
URL: https://github.com/apache/iceberg/pull/15883
### What
Implements bounds-based evaluation for `notStartsWith` in
`StrictMetricsEvaluator`, replacing the existing TODO with actual logic.
Previously, `notStartsWith` always returned `ROWS_MIGHT_NOT_MATCH`,
which prevented the engine from eliminating the residual predicate even
when file-level column bounds made it provable that no value could start
with the given prefix.
### Changes
- **`StrictMetricsEvaluator.notStartsWith`**: Added checks for nested
columns, all-nulls columns, and lower/upper bound comparisons against
the prefix. Returns `ROWS_MUST_MATCH` when bounds prove the prefix is
entirely outside the value range.
- **`TestStrictMetricsEvaluator`**: Added 8 test methods covering:
all-nulls, bounds above/below/overlapping the prefix, wider ranges,
missing stats, some-nulls with bounds outside prefix, and prefix
longer than bounds.
### How it works
For `NOT STARTS WITH <prefix>`:
- If the lower bound (truncated to `min(prefixLen, boundLen)`) is
strictly greater than the prefix, all values are above the prefix
range → `ROWS_MUST_MATCH`
- If the upper bound (truncated to `min(prefixLen, boundLen)`) is
strictly less than the prefix, all values are below the prefix range
→ `ROWS_MUST_MATCH`
- Otherwise, fall through to `ROWS_MIGHT_NOT_MATCH` (conservative)
This follows the same pattern used by `notEq` and `notIn` in this
class, including the null-handling convention.
Closes #15882
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]