[GitHub] [beam] robinyqiu commented on pull request #12292: [BEAM-8057] Support ZetaSQL DOUBLE +INF, -INF and NAN

GitBox Fri, 17 Jul 2020 15:13:00 -0700


robinyqiu commented on pull request #12292:
URL: https://github.com/apache/beam/pull/12292#issuecomment-660358873



   Actually this change unveils a data corruption bug (produces wrong result).
   
   It is related to NaN comparison (=, !=, >, >=, <, and <=). For example, if 
you have a query like
   
   ```
   SELECT CAST('NAN' AS FLOAT64) = CAST('NAN' AS FLOAT64)
   ```
   
   the expected result is `false` because according to ZetaSQL definition, 
`NaN` compared to **any** value should return false, but our engine returns 
true (please add a test to reproduce this.)
   
   I dug into this a bit and found it is because Calcite internally does 
simplification to comparison operations 
(https://github.com/apache/calcite/blob/3fb68f6c22a7bcbc4cb1fff114bc911b1e31c4de/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L336-L351).
 So the above expression will be simplified to (you can print the sql query in 
`BeamZetaSqlCalcRel` to verify yourself)
   
   ```
   NULL OR CAST('NAN' AS FLOAT64) IS NOT NULL
   ```
   
   The simplification is triggered because the 2 operands of `=` are "equal" 
(they are both a `RexCall` to a function named `double_nan` with no input).
   
   One way I can think of to fix this, without affecting other code path, is 
that we can generate a random double and convert it to `BigDecimal` and make it 
a input to `double_nan`. This input parameter is never used, its sole purpose 
is to make Calcite treat two calls to `double_nan` different.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] robinyqiu commented on pull request #12292: [BEAM-8057] Support ZetaSQL DOUBLE +INF, -INF and NAN

Reply via email to