JFinis commented on code in PR #196:
URL: https://github.com/apache/parquet-format/pull/196#discussion_r1146067341


##########
src/main/thrift/parquet.thrift:
##########
@@ -886,16 +888,23 @@ union ColumnOrder {
    *   FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
    *
    * (*) Because the sorting order is not specified properly for floating
-   *     point values (relations vs. total ordering) the following
-   *     compatibility rules should be applied when reading statistics:
-   *     - If the min is a NaN, it should be ignored.
-   *     - If the max is a NaN, it should be ignored.
+   *     point values (relations vs. total ordering), the following 
compatibility
+   *     rules should be applied when reading statistics:
+   *     - If the nan_count field is set to > 0 and both min and max are
+   *       NaN, a reader can rely on that all non-NULL values are NaN
+   *     - Otherwise, if the min or the max is a NaN, it should be ignored.
+   *     - When looking for NaN values, min and max should be ignored;
+   *       if the nan_count field is set, it can be used to check whether
+   *       NaNs are present.
    *     - If the min is +0, the row group may contain -0 values as well.
    *     - If the max is -0, the row group may contain +0 values as well.
-   *     - When looking for NaN values, min and max should be ignored.
    * 
    *     When writing statistics the following rules should be followed:
-   *     - NaNs should not be written to min or max statistics fields.
+   *     - The nan_count fields should always be set for FLOAT and DOUBLE 
columns.

Review Comment:
   okay, I can soften the wording here 👍 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to