JFinis commented on code in PR #196:
URL: https://github.com/apache/parquet-format/pull/196#discussion_r1146080282


##########
README.md:
##########
@@ -163,18 +163,25 @@ following rules:
       [Thrift definition](src/main/thrift/parquet.thrift) in the
       `ColumnOrder` union. They are summarized here but the Thrift definition
       is considered authoritative:
-      * NaNs should not be written to min or max statistics fields.
-      * If the computed max value is zero (whether negative or positive),
-        `+0.0` should be written into the max statistics field.
-      * If the computed min value is zero (whether negative or positive),
-        `-0.0` should be written into the min statistics field.
-
-      For backwards compatibility when reading files:
-      * If the min is a NaN, it should be ignored.
-      * If the max is a NaN, it should be ignored.
-      * If the min is +0, the row group may contain -0 values as well.
-      * If the max is -0, the row group may contain +0 values as well.
-      * When looking for NaN values, min and max should be ignored.
+      * The following compatibility rules should be applied when reading 
statistics:
+        * If the nan_count field is set to > 0 and both min and max are

Review Comment:
   To this suggestion: 
   
   > Seems it's a little strict here? Just ingore min-max seems ok?
   
   Note that the line you mentioned here just tells a reader that they *can* 
rely on this information, and therfore could, e.g., skip this page if a 
predicate like `x = 12.34` was used. They can of course also opt to ignore this 
information and not skip but rather scan the page. If we removed this, a reader 
couldn't do the skip here. 
   
   I guess this is related to your general suggestion: How do we detect 
only-NaN pages? Depending on what we do for that, this line will be adapted 
accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to