[ 
https://issues.apache.org/jira/browse/PARQUET-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413807#comment-16413807
 ] 

ASF GitHub Bot commented on PARQUET-1251:
-----------------------------------------

zivanfi closed pull request #88: PARQUET-1251: Clarify ambiguous min/max stats 
for FLOAT/DOUBLE
URL: https://github.com/apache/parquet-format/pull/88
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 195ff908..bee82b30 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -751,10 +751,19 @@ union ColumnOrder {
    *   INT32 - signed comparison
    *   INT64 - signed comparison
    *   INT96 (only used for legacy timestamps) - undefined
-   *   FLOAT - signed comparison of the represented value
-   *   DOUBLE - signed comparison of the represented value
+   *   FLOAT - signed comparison of the represented value (*)
+   *   DOUBLE - signed comparison of the represented value (*)
    *   BYTE_ARRAY - unsigned byte-wise comparison
    *   FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
+   *
+   * (*) Because the sorting order is not specified properly for floating
+   *     point values (relations vs. total ordering) the following
+   *     compatibility rules should be applied when reading statistics:
+   *     - If the min is a NaN, it should be ignored.
+   *     - If the max is a NaN, it should be ignored.
+   *     - If the min is +0, the row group may contain -0 values as well.
+   *     - If the max is -0, the row group may contain +0 values as well.
+   *     - When looking for NaN values, min and max should be ignored.
    */
   1: TypeDefinedOrder TYPE_ORDER;
 }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Clarify ambiguous min/max stats for FLOAT/DOUBLE
> ------------------------------------------------
>
>                 Key: PARQUET-1251
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1251
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: format-2.4.0
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: format-2.5.0
>
>
> Describe the handling of the ambigous min/max statistics for FLOAT/DOUBLE 
> types in case of TypeDefinedOrder. (See PARQUET-1222 for details.)
> * When looking for NaN values, min and max should be ignored.
> * If the min is a NaN, it should be ignored.
> * If the max is a NaN, it should be ignored.
> * If the min is +0, the row group may contain -0 values as well.
> * If the max is -0, the row group may contain +0 values as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to