Hi Jingsong,

Thank you so much for your reply and suggestions—even during the holiday! I
really appreciate it.

Initially, my intention for introducing new operators was to make the logic
under the EQUAL_NULL_SAFE branch more self-explanatory and readable.
However, I overlooked the fact that this approach would add extra files,
most of whose logic overlaps with existing ones. Thank you for pointing
this out!

I now realize that the same goal can be achieved simply by modifying the
existing code like this:

> case EQUAL_NULL_SAFE =>
>   BinaryPredicate.unapply(sparkPredicate) match {
>     case Some((fieldName, literal)) =>
>       val index = fieldIndex(fieldName)
>       if (literal == null) {
>         builder.isNull(index)
>       } else {
>         PredicateBuilder.and(
>           builder.isNotNull(index),
>           builder.equal(index, convertLiteral(index, literal))
>         )
>         // builder.equal(index, convertLiteral(index, literal))
>       }
>   }
>
> Will this modification achieve the desired result? Thanks again for your
valuable feedback!

Best regards,
Apollo

Jingsong Li <[email protected]> 于2026年1月1日周四 17:16写道:

> Hi Apollo,
>
> Thanks for reporting this issue in dev.
>
> I think we can solve this without introducing new predicates.
>
> EqualNullSafe =>
>
> 1. if literal is Null, just convert it to IsNull.
> 2. if literal is not Null, convert it to (col is not Null and col =
> literal).
>
> What do you think?
>
> Best,
> Jingsong
>
> On Thu, Jan 1, 2026 at 12:42 PM Apollo Elon <[email protected]> wrote:
> >
> > Hi Paimon Dev Team,
> >       In GitHub Issue #6931 <
> https://github.com/apache/paimon/issues/6931>,
> > we identified a correctness issue in filter pushdown when Spark SQL uses
> > the null-safe equality operator (<=>).
> >       The root cause is that Paimon currently treats both regular
> equality (
> > =) and null-safe equality (<=>) as the same Equal operator during
> predicate
> > pushdown. Moreover, their negations are uniformly simplified into a
> generic
> > NotEqual predicate, which does not account for the distinct semantics of
> > !<=>—particularly the fact that it can be true when the column contains
> NULL
> > .
> > To address this, I propose introducing two new filter operators:
> >
> >    - SafeEqual: Semantically identical to Equal (used when the literal is
> >    non-null).
> >    - NotSafeEqual: Specifically for !(col <=> literal), with a test()
> >    method that respects null-safe semantics:
> >
> > @Override
> > > public boolean test(
> > >         DataType type, long rowCount, Object min, Object max, Long
> nullCount, Object literal) {
> > >     // According to the semantics of SafeEqual,
> > >     // as long as the file contains "null", it meets the data
> condition.
> > >     if(!Objects.isNull(nullCount) && nullCount > 0) {
> > >         return true;
> > >     }
> > >     return compareLiteral(type, literal, min) != 0 ||
> compareLiteral(type, literal, max) != 0;
> > > }
> > >
> > >  I’ve also updated SparkV2FilterConverter.convert to properly route
> > EQUAL_NULL_SAFE:
> >
> > > case EQUAL_NULL_SAFE =>
> > >   sparkPredicate match {
> > >     case BinaryPredicate(transform, literal) =>
> > >       if (literal == null) {
> > >         builder.isNull(transform)
> > >       } else {
> > >         // builder.equal(transform, literal)
> > >         builder.safeEqual(transform, literal)
> > >       }
> > >     case _ =>
> > >       throw new UnsupportedOperationException(s"Convert
> $sparkPredicate is unsupported.")
> > >   }
> > >
> > >      This ensures:
> >
> >    - col <=> null → isNull(col)
> >    - col <=> value → safeEqual(col, value)
> >    - !(col <=> value) → notSafeEqual(col, value)
> >
> > With these changes, file skipping becomes both correct and efficient,
> > aligning Paimon’s behavior with Spark’s evaluation semantics.
> >
> > I’m happy to submit a PR for this fix and welcome any feedback on the
> > design.
> >
> > Best regards 😀
>

Reply via email to