Hi Paimon Dev Team,
      In GitHub Issue #6931 <https://github.com/apache/paimon/issues/6931>,
we identified a correctness issue in filter pushdown when Spark SQL uses
the null-safe equality operator (<=>).
      The root cause is that Paimon currently treats both regular equality (
=) and null-safe equality (<=>) as the same Equal operator during predicate
pushdown. Moreover, their negations are uniformly simplified into a generic
NotEqual predicate, which does not account for the distinct semantics of
!<=>—particularly the fact that it can be true when the column contains NULL
.
To address this, I propose introducing two new filter operators:

   - SafeEqual: Semantically identical to Equal (used when the literal is
   non-null).
   - NotSafeEqual: Specifically for !(col <=> literal), with a test()
   method that respects null-safe semantics:

@Override
> public boolean test(
>         DataType type, long rowCount, Object min, Object max, Long nullCount, 
> Object literal) {
>     // According to the semantics of SafeEqual,
>     // as long as the file contains "null", it meets the data condition.
>     if(!Objects.isNull(nullCount) && nullCount > 0) {
>         return true;
>     }
>     return compareLiteral(type, literal, min) != 0 || compareLiteral(type, 
> literal, max) != 0;
> }
>
>  I’ve also updated SparkV2FilterConverter.convert to properly route
EQUAL_NULL_SAFE:

> case EQUAL_NULL_SAFE =>
>   sparkPredicate match {
>     case BinaryPredicate(transform, literal) =>
>       if (literal == null) {
>         builder.isNull(transform)
>       } else {
>         // builder.equal(transform, literal)
>         builder.safeEqual(transform, literal)
>       }
>     case _ =>
>       throw new UnsupportedOperationException(s"Convert $sparkPredicate is 
> unsupported.")
>   }
>
>      This ensures:

   - col <=> null → isNull(col)
   - col <=> value → safeEqual(col, value)
   - !(col <=> value) → notSafeEqual(col, value)

With these changes, file skipping becomes both correct and efficient,
aligning Paimon’s behavior with Spark’s evaluation semantics.

I’m happy to submit a PR for this fix and welcome any feedback on the
design.

Best regards 😀

Reply via email to