Re: [PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

via GitHub Thu, 03 Apr 2025 03:04:08 -0700


tustvold commented on code in PR #221:
URL: https://github.com/apache/parquet-format/pull/221#discussion_r2026647690



##########
src/main/thrift/parquet.thrift:
##########
@@ -979,6 +988,58 @@ union ColumnOrder {
    *       `-0.0` should be written into the min statistics field.
    */
   1: TypeDefinedOrder TYPE_ORDER;
+
+  /*
+   * The floating point type is ordered according to the totalOrder predicate,
+   * as defined in section 5.10 of IEEE-754 (2008 revision). Only columns of
+   * physical type FLOAT or DOUBLE, or logical type FLOAT16 may use this 
ordering.
+
+   * Intuitively, this orders floats mathematically, but defines -0 to be less
+   * than +0, -NaN to be less than anything else, and +NaN to be greater than
+   * anything else. It also defines an order between different bit 
representations
+   * of the same value.
+   *
+   * The formal definition is as follows:
+   *   a) If x<y, totalOrder(x, y) is true.
+   *   b) If x>y, totalOrder(x, y) is false.
+   *   c) If x=y:
+   *     1) totalOrder(−0, +0) is true.
+   *     2) totalOrder(+0, −0) is false.
+   *     3) If x and y represent the same floating-point datum:
+   *        i) If x and y have negative sign, totalOrder(x, y) is true if and
+   *           only if the exponent of x ≥ the exponent of y
+   *       ii) otherwise totalOrder(x, y) is true if and only if the exponent
+   *           of x ≤ the exponent of y.
+   *   d) If x and y are unordered numerically because x or y is NaN:
+   *     1) totalOrder(−NaN, y) is true where −NaN represents a NaN with
+   *        negative sign bit and y is a floating-point number.
+   *     2) totalOrder(x, +NaN) is true where +NaN represents a NaN with
+   *        positive sign bit and x is a floating-point number.
+   *     3) If x and y are both NaNs, then totalOrder reflects a total ordering
+   *        based on:
+   *         i) negative sign orders below positive sign
+   *        ii) signaling orders below quiet for +NaN, reverse for −NaN
+   *       iii) lesser payload, when regarded as an integer, orders below
+   *            greater payload for +NaN, reverse for −NaN.
+   *
+   * Note that this ordering can be implemented efficiently in software
+   * by flipping all non-sign bits in case of a set sign bit to achieve a
+   * two's-complement-like representation and then performing a signed
+   * integer comparison on the resulting bits.
+   * E.g., this is a possible implementation for DOUBLE in Rust:
+   *
+   *   pub fn totalOrder(x: f64, y: f64) -> bool {

Review Comment:
   One of the major benefits of this total ordering formulation is it is 
NaN-agnostic



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] PARQUET-2249: Introduce IEEE 754 total order [parquet-format]

Reply via email to