Hi Paimon Dev Team,
In GitHub Issue #6931 <https://github.com/apache/paimon/issues/6931>,
we identified a correctness issue in filter pushdown when Spark SQL uses
the null-safe equality operator (<=>).
The root cause is that Paimon currently treats both regular equality (
=) and null-safe equality (<=>) as the same Equal operator during predicate
pushdown. Moreover, their negations are uniformly simplified into a generic
NotEqual predicate, which does not account for the distinct semantics of
!<=>—particularly the fact that it can be true when the column contains NULL
.
To address this, I propose introducing two new filter operators:
- SafeEqual: Semantically identical to Equal (used when the literal is
non-null).
- NotSafeEqual: Specifically for !(col <=> literal), with a test()
method that respects null-safe semantics:
@Override
> public boolean test(
> DataType type, long rowCount, Object min, Object max, Long nullCount,
> Object literal) {
> // According to the semantics of SafeEqual,
> // as long as the file contains "null", it meets the data condition.
> if(!Objects.isNull(nullCount) && nullCount > 0) {
> return true;
> }
> return compareLiteral(type, literal, min) != 0 || compareLiteral(type,
> literal, max) != 0;
> }
>
> I’ve also updated SparkV2FilterConverter.convert to properly route
EQUAL_NULL_SAFE:
> case EQUAL_NULL_SAFE =>
> sparkPredicate match {
> case BinaryPredicate(transform, literal) =>
> if (literal == null) {
> builder.isNull(transform)
> } else {
> // builder.equal(transform, literal)
> builder.safeEqual(transform, literal)
> }
> case _ =>
> throw new UnsupportedOperationException(s"Convert $sparkPredicate is
> unsupported.")
> }
>
> This ensures:
- col <=> null → isNull(col)
- col <=> value → safeEqual(col, value)
- !(col <=> value) → notSafeEqual(col, value)
With these changes, file skipping becomes both correct and efficient,
aligning Paimon’s behavior with Spark’s evaluation semantics.
I’m happy to submit a PR for this fix and welcome any feedback on the
design.
Best regards 😀