[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316469#comment-17316469 ]
Antoine Pitrou commented on PARQUET-1222: ----------------------------------------- Some answers after looking through the code: * parquet-cpp does not read nor write ColumnIndex * our handling of min_value and max_value on the read path is naive. We use the same comparisons regardless of whether ColumnOrder is present or not. In particular, we use native type-specific greater-or-equal comparison (e.g. floating-point comparison), which is due to fail with NaNs (but will succeed with signed zeros). > Specify a well-defined sorting order for float and double types > --------------------------------------------------------------- > > Key: PARQUET-1222 > URL: https://issues.apache.org/jira/browse/PARQUET-1222 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Reporter: Zoltan Ivanfi > Priority: Critical > > Currently parquet-format specifies the sort order for floating point numbers > as follows: > {code:java} > * FLOAT - signed comparison of the represented value > * DOUBLE - signed comparison of the represented value > {code} > The problem is that the comparison of floating point numbers is only a > partial ordering with strange behaviour in specific corner cases. For > example, according to IEEE 754, -0 is neither less nor more than \+0 and > comparing NaN to anything always returns false. This ordering is not suitable > for statistics. Additionally, the Java implementation already uses a > different (total) ordering that handles these cases correctly but differently > than the C\+\+ implementations, which leads to interoperability problems. > TypeDefinedOrder for doubles and floats should be deprecated and a new > TotalFloatingPointOrder should be introduced. The default for writing doubles > and floats would be the new TotalFloatingPointOrder. This ordering should be > effective and easy to implement in all programming languages. -- This message was sent by Atlassian Jira (v8.3.4#803005)