Hi Apollo, Thanks for reporting this issue in dev.
I think we can solve this without introducing new predicates. EqualNullSafe => 1. if literal is Null, just convert it to IsNull. 2. if literal is not Null, convert it to (col is not Null and col = literal). What do you think? Best, Jingsong On Thu, Jan 1, 2026 at 12:42 PM Apollo Elon <[email protected]> wrote: > > Hi Paimon Dev Team, > In GitHub Issue #6931 <https://github.com/apache/paimon/issues/6931>, > we identified a correctness issue in filter pushdown when Spark SQL uses > the null-safe equality operator (<=>). > The root cause is that Paimon currently treats both regular equality ( > =) and null-safe equality (<=>) as the same Equal operator during predicate > pushdown. Moreover, their negations are uniformly simplified into a generic > NotEqual predicate, which does not account for the distinct semantics of > !<=>—particularly the fact that it can be true when the column contains NULL > . > To address this, I propose introducing two new filter operators: > > - SafeEqual: Semantically identical to Equal (used when the literal is > non-null). > - NotSafeEqual: Specifically for !(col <=> literal), with a test() > method that respects null-safe semantics: > > @Override > > public boolean test( > > DataType type, long rowCount, Object min, Object max, Long > > nullCount, Object literal) { > > // According to the semantics of SafeEqual, > > // as long as the file contains "null", it meets the data condition. > > if(!Objects.isNull(nullCount) && nullCount > 0) { > > return true; > > } > > return compareLiteral(type, literal, min) != 0 || compareLiteral(type, > > literal, max) != 0; > > } > > > > I’ve also updated SparkV2FilterConverter.convert to properly route > EQUAL_NULL_SAFE: > > > case EQUAL_NULL_SAFE => > > sparkPredicate match { > > case BinaryPredicate(transform, literal) => > > if (literal == null) { > > builder.isNull(transform) > > } else { > > // builder.equal(transform, literal) > > builder.safeEqual(transform, literal) > > } > > case _ => > > throw new UnsupportedOperationException(s"Convert $sparkPredicate is > > unsupported.") > > } > > > > This ensures: > > - col <=> null → isNull(col) > - col <=> value → safeEqual(col, value) > - !(col <=> value) → notSafeEqual(col, value) > > With these changes, file skipping becomes both correct and efficient, > aligning Paimon’s behavior with Spark’s evaluation semantics. > > I’m happy to submit a PR for this fix and welcome any feedback on the > design. > > Best regards 😀
