yyanyy commented on pull request #1747:
URL: https://github.com/apache/iceberg/pull/1747#issuecomment-725836609


   > > I was thinking how we should change metrics evaluators when we exclude 
NaN from upper/lower bounds. Here's a table . . .
   > 
   > I think we should not produce predicates that use `NaN` as a literal for 
comparison. We can easily rewrite `equal` and `notEqual` to `isNaN` and 
`notNaN`. We can also rewrite `in` and `notIn` to `or(in(non-NaNs), isNaN)` or 
`and(notIn(non-NaNs), isNotNaN)`. Then inequalities would either be converted 
to `alwaysFalse` or throw an exception because we don't accept the predicate. 
I'd lean toward throwing an exception if someone uses `floatCol < NaN`.
   
   Thank you for all the comments! I'll update `Expressions` to include 
rewritings in this PR. 
   
   Do you have comment on the case of "this may result in v2 returning more 
files than v1" when literal is not NaN but the data to be compared have NaN? We 
might need to accept that to keep behavior of comparing with NaN consistent 
across different files? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to