Cool!

Feel free to open a PR for this.

Best,
Jingsong

On Thu, Jan 1, 2026 at 7:06 PM Apollo Elon <[email protected]> wrote:
>
> Hi Jingsong,
>
> Thank you so much for your reply and suggestions—even during the holiday! I
> really appreciate it.
>
> Initially, my intention for introducing new operators was to make the logic
> under the EQUAL_NULL_SAFE branch more self-explanatory and readable.
> However, I overlooked the fact that this approach would add extra files,
> most of whose logic overlaps with existing ones. Thank you for pointing
> this out!
>
> I now realize that the same goal can be achieved simply by modifying the
> existing code like this:
>
> > case EQUAL_NULL_SAFE =>
> >   BinaryPredicate.unapply(sparkPredicate) match {
> >     case Some((fieldName, literal)) =>
> >       val index = fieldIndex(fieldName)
> >       if (literal == null) {
> >         builder.isNull(index)
> >       } else {
> >         PredicateBuilder.and(
> >           builder.isNotNull(index),
> >           builder.equal(index, convertLiteral(index, literal))
> >         )
> >         // builder.equal(index, convertLiteral(index, literal))
> >       }
> >   }
> >
> > Will this modification achieve the desired result? Thanks again for your
> valuable feedback!
>
> Best regards,
> Apollo
>
> Jingsong Li <[email protected]> 于2026年1月1日周四 17:16写道:
>
> > Hi Apollo,
> >
> > Thanks for reporting this issue in dev.
> >
> > I think we can solve this without introducing new predicates.
> >
> > EqualNullSafe =>
> >
> > 1. if literal is Null, just convert it to IsNull.
> > 2. if literal is not Null, convert it to (col is not Null and col =
> > literal).
> >
> > What do you think?
> >
> > Best,
> > Jingsong
> >
> > On Thu, Jan 1, 2026 at 12:42 PM Apollo Elon <[email protected]> wrote:
> > >
> > > Hi Paimon Dev Team,
> > >       In GitHub Issue #6931 <
> > https://github.com/apache/paimon/issues/6931>,
> > > we identified a correctness issue in filter pushdown when Spark SQL uses
> > > the null-safe equality operator (<=>).
> > >       The root cause is that Paimon currently treats both regular
> > equality (
> > > =) and null-safe equality (<=>) as the same Equal operator during
> > predicate
> > > pushdown. Moreover, their negations are uniformly simplified into a
> > generic
> > > NotEqual predicate, which does not account for the distinct semantics of
> > > !<=>—particularly the fact that it can be true when the column contains
> > NULL
> > > .
> > > To address this, I propose introducing two new filter operators:
> > >
> > >    - SafeEqual: Semantically identical to Equal (used when the literal is
> > >    non-null).
> > >    - NotSafeEqual: Specifically for !(col <=> literal), with a test()
> > >    method that respects null-safe semantics:
> > >
> > > @Override
> > > > public boolean test(
> > > >         DataType type, long rowCount, Object min, Object max, Long
> > nullCount, Object literal) {
> > > >     // According to the semantics of SafeEqual,
> > > >     // as long as the file contains "null", it meets the data
> > condition.
> > > >     if(!Objects.isNull(nullCount) && nullCount > 0) {
> > > >         return true;
> > > >     }
> > > >     return compareLiteral(type, literal, min) != 0 ||
> > compareLiteral(type, literal, max) != 0;
> > > > }
> > > >
> > > >  I’ve also updated SparkV2FilterConverter.convert to properly route
> > > EQUAL_NULL_SAFE:
> > >
> > > > case EQUAL_NULL_SAFE =>
> > > >   sparkPredicate match {
> > > >     case BinaryPredicate(transform, literal) =>
> > > >       if (literal == null) {
> > > >         builder.isNull(transform)
> > > >       } else {
> > > >         // builder.equal(transform, literal)
> > > >         builder.safeEqual(transform, literal)
> > > >       }
> > > >     case _ =>
> > > >       throw new UnsupportedOperationException(s"Convert
> > $sparkPredicate is unsupported.")
> > > >   }
> > > >
> > > >      This ensures:
> > >
> > >    - col <=> null → isNull(col)
> > >    - col <=> value → safeEqual(col, value)
> > >    - !(col <=> value) → notSafeEqual(col, value)
> > >
> > > With these changes, file skipping becomes both correct and efficient,
> > > aligning Paimon’s behavior with Spark’s evaluation semantics.
> > >
> > > I’m happy to submit a PR for this fix and welcome any feedback on the
> > > design.
> > >
> > > Best regards 😀
> >

Reply via email to