benkrug commented on code in PR #13671: URL: https://github.com/apache/druid/pull/13671#discussion_r1073979796
########## docs/querying/sql-operators.md: ########## @@ -37,6 +37,10 @@ sidebar_label: "Operators" Operators in [Druid SQL](./sql.md) typically operate on one or two values and return a result based on the values. Types of operators in Druid SQL include arithmetic, comparison, logical, and more, as described here. +When performing math operations, Druid uses the integer datatype unless there are double or float values. If double or float values are involved, Druid uses double. Note that the highest precision way to store digits in Druid are 64-bit integers (long) or 64-bit floats (double). In essence, a double can represent 52 binary digits, so Druid may return incorrect results for doubles if the value exceeds 2^52. + +For more information about how Java handles primitive data types and how it may impact the results you get, see [Primitive data types in Java are a matter of precision](https://blogs.oracle.com/javamagazine/post/java-primitive-datatypes-int-float-double). Review Comment: The commit is a bit off still. According to [current docs](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types), druid has long, float and double. "float" is a 32-bit float, and "double" is a 64-bit float. I suggested some references because the math of float precision is beyond the scope of druid docs, imo. The main points are that longs can store up to 2^63 accurately (the current commit says doubles, that should be longs, iiuc), and floats and doubles use 32-bit and 64-bit floating point. Any floating point storage format will have variable precision depending on the size of the numbers. (See the linked URLs in my earlier comment.) Floating point precision is really complicated and mathematical, beyond the scope of druid docs imo (again), and it's a general condition in software. Just saying that "float" and "double" both use floating point is the main point. We could get in to the bounds for integers being represented exactly in floats, but that ignores decimals, which is probably the point of using a float. Maybe we can also add a comment along the lines that if exact decimal values are needed, and you need, eg, 3 decimal places, you can store the number multiplied by 1000 as long, and divide again when querying. This will be exact, up to the min and max values for longs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
