[ 
https://issues.apache.org/jira/browse/PARQUET-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413677#comment-16413677
 ] 

ASF GitHub Bot commented on PARQUET-1251:
-----------------------------------------

zivanfi commented on a change in pull request #88: PARQUET-1251: Clarify 
ambiguous min/max stats for FLOAT/DOUBLE
URL: https://github.com/apache/parquet-format/pull/88#discussion_r177049956
 
 

 ##########
 File path: src/main/thrift/parquet.thrift
 ##########
 @@ -751,10 +751,19 @@ union ColumnOrder {
    *   INT32 - signed comparison
    *   INT64 - signed comparison
    *   INT96 (only used for legacy timestamps) - undefined
-   *   FLOAT - signed comparison of the represented value
-   *   DOUBLE - signed comparison of the represented value
+   *   FLOAT - signed comparison of the represented value (*)
+   *   DOUBLE - signed comparison of the represented value (*)
    *   BYTE_ARRAY - unsigned byte-wise comparison
    *   FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
+   *
+   * (*) Because of the sorting order is not specified properly for floating
+   *     point values (relations vs. total ordering) the following
+   *     compatibility rules should be applied:
+   *     - When looking for NaN values, min and max should be ignored.
 
 Review comment:
   Please move this to the end of the list as this is the least important item.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Clarify ambiguous min/max stats for FLOAT/DOUBLE
> ------------------------------------------------
>
>                 Key: PARQUET-1251
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1251
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: format-2.4.0
>            Reporter: Gabor Szadovszky
>            Assignee: Gabor Szadovszky
>            Priority: Major
>             Fix For: format-2.5.0
>
>
> Describe the handling of the ambigous min/max statistics for FLOAT/DOUBLE 
> types in case of TypeDefinedOrder. (See PARQUET-1222 for details.)
> * When looking for NaN values, min and max should be ignored.
> * If the min is a NaN, it should be ignored.
> * If the max is a NaN, it should be ignored.
> * If the min is +0, the row group may contain -0 values as well.
> * If the max is -0, the row group may contain +0 values as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to