Hi Jingsong,
Thank you so much for your reply and suggestions—even during the holiday! I
really appreciate it.
Initially, my intention for introducing new operators was to make the logic
under the EQUAL_NULL_SAFE branch more self-explanatory and readable.
However, I overlooked the fact that this approach would add extra files,
most of whose logic overlaps with existing ones. Thank you for pointing
this out!
I now realize that the same goal can be achieved simply by modifying the
existing code like this:
> case EQUAL_NULL_SAFE =>
> BinaryPredicate.unapply(sparkPredicate) match {
> case Some((fieldName, literal)) =>
> val index = fieldIndex(fieldName)
> if (literal == null) {
> builder.isNull(index)
> } else {
> PredicateBuilder.and(
> builder.isNotNull(index),
> builder.equal(index, convertLiteral(index, literal))
> )
> // builder.equal(index, convertLiteral(index, literal))
> }
> }
>
> Will this modification achieve the desired result? Thanks again for your
valuable feedback!
Best regards,
Apollo
Jingsong Li <[email protected]> 于2026年1月1日周四 17:16写道:
> Hi Apollo,
>
> Thanks for reporting this issue in dev.
>
> I think we can solve this without introducing new predicates.
>
> EqualNullSafe =>
>
> 1. if literal is Null, just convert it to IsNull.
> 2. if literal is not Null, convert it to (col is not Null and col =
> literal).
>
> What do you think?
>
> Best,
> Jingsong
>
> On Thu, Jan 1, 2026 at 12:42 PM Apollo Elon <[email protected]> wrote:
> >
> > Hi Paimon Dev Team,
> > In GitHub Issue #6931 <
> https://github.com/apache/paimon/issues/6931>,
> > we identified a correctness issue in filter pushdown when Spark SQL uses
> > the null-safe equality operator (<=>).
> > The root cause is that Paimon currently treats both regular
> equality (
> > =) and null-safe equality (<=>) as the same Equal operator during
> predicate
> > pushdown. Moreover, their negations are uniformly simplified into a
> generic
> > NotEqual predicate, which does not account for the distinct semantics of
> > !<=>—particularly the fact that it can be true when the column contains
> NULL
> > .
> > To address this, I propose introducing two new filter operators:
> >
> > - SafeEqual: Semantically identical to Equal (used when the literal is
> > non-null).
> > - NotSafeEqual: Specifically for !(col <=> literal), with a test()
> > method that respects null-safe semantics:
> >
> > @Override
> > > public boolean test(
> > > DataType type, long rowCount, Object min, Object max, Long
> nullCount, Object literal) {
> > > // According to the semantics of SafeEqual,
> > > // as long as the file contains "null", it meets the data
> condition.
> > > if(!Objects.isNull(nullCount) && nullCount > 0) {
> > > return true;
> > > }
> > > return compareLiteral(type, literal, min) != 0 ||
> compareLiteral(type, literal, max) != 0;
> > > }
> > >
> > > I’ve also updated SparkV2FilterConverter.convert to properly route
> > EQUAL_NULL_SAFE:
> >
> > > case EQUAL_NULL_SAFE =>
> > > sparkPredicate match {
> > > case BinaryPredicate(transform, literal) =>
> > > if (literal == null) {
> > > builder.isNull(transform)
> > > } else {
> > > // builder.equal(transform, literal)
> > > builder.safeEqual(transform, literal)
> > > }
> > > case _ =>
> > > throw new UnsupportedOperationException(s"Convert
> $sparkPredicate is unsupported.")
> > > }
> > >
> > > This ensures:
> >
> > - col <=> null → isNull(col)
> > - col <=> value → safeEqual(col, value)
> > - !(col <=> value) → notSafeEqual(col, value)
> >
> > With these changes, file skipping becomes both correct and efficient,
> > aligning Paimon’s behavior with Spark’s evaluation semantics.
> >
> > I’m happy to submit a PR for this fix and welcome any feedback on the
> > design.
> >
> > Best regards 😀
>