[ 
https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316469#comment-17316469
 ] 

Antoine Pitrou commented on PARQUET-1222:
-----------------------------------------

Some answers after looking through the code:
* parquet-cpp does not read nor write ColumnIndex
* our handling of min_value and max_value on the read path is naive. We use the 
same comparisons regardless of whether ColumnOrder is present or not. In 
particular, we use native type-specific greater-or-equal comparison (e.g. 
floating-point comparison), which is due to fail with NaNs (but will succeed 
with signed zeros).



> Specify a well-defined sorting order for float and double types
> ---------------------------------------------------------------
>
>                 Key: PARQUET-1222
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1222
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>            Reporter: Zoltan Ivanfi
>            Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers 
> as follows:
> {code:java}
>    *   FLOAT - signed comparison of the represented value
>    *   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a 
> partial ordering with strange behaviour in specific corner cases. For 
> example, according to IEEE 754, -0 is neither less nor more than \+0 and 
> comparing NaN to anything always returns false. This ordering is not suitable 
> for statistics. Additionally, the Java implementation already uses a 
> different (total) ordering that handles these cases correctly but differently 
> than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new 
> TotalFloatingPointOrder should be introduced. The default for writing doubles 
> and floats would be the new TotalFloatingPointOrder. This ordering should be 
> effective and easy to implement in all programming languages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to