rdblue commented on a change in pull request #1747: URL: https://github.com/apache/iceberg/pull/1747#discussion_r529813847
########## File path: api/src/main/java/org/apache/iceberg/expressions/Expressions.java ########## @@ -123,6 +123,22 @@ public static Expression not(Expression child) { return new UnboundPredicate<>(Expression.Operation.NOT_NULL, expr); } + public static <T> UnboundPredicate<T> isNaN(String name) { + return new UnboundPredicate<>(Expression.Operation.IS_NAN, ref(name)); + } + + public static <T> UnboundPredicate<T> isNaN(UnboundTerm<T> expr) { + return new UnboundPredicate<>(Expression.Operation.IS_NAN, expr); + } + + public static <T> UnboundPredicate<T> notNaN(String name) { + return new UnboundPredicate<>(Expression.Operation.NOT_NAN, ref(name)); + } + + public static <T> UnboundPredicate<T> notNaN(UnboundTerm<T> expr) { + return new UnboundPredicate<>(Expression.Operation.NOT_NAN, expr); + } Review comment: > If I understand correctly, to reject NaN in any predicate sounds like we might go back to the idea of rewriting equals in SparkFilters Yes. If the engine generally uses `d = NaN` then we can convert that to `isNaN`. But that would be engine-dependent and the Iceberg expression API would not support equals with NaN. > are we able to rely on engine to do this check before translating query to Expression? I think so. Most engines will optimize the SQL expressions and handle this already. If not, then it would result in an exception from Iceberg to the user. I think that's okay, too, because as I said above, we want to fail if a NaN is used in an expression with a non-floating-point column, not rewrite to false. > And seems like this may only impact eq as we decided to do input validation on other lg/lteq/gt/gteq and in anyway? Yes. This makes all of the handling in `Expressions` consistent: always reject NaN values. > that may sound backward incompatible until the engine starts to rewrite NaN? I'm not convinced either way. You could argue that `d = NaN` is ambiguous and that rejecting it is now fixing a bug. That's certainly the case with `d > NaN`, which is not defined. On the other hand, there was _some_ bevhavior before that will now no longer work. So I'd be up for fixing this in Flink and Spark conversions as soon as we can. Feel free to ping me on Slack! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org