[
https://issues.apache.org/jira/browse/ARROW-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932399#comment-16932399
]
Antoine Pitrou commented on ARROW-6149:
---------------------------------------
cc [~wesmckinn]
> [Parquet] Decimal comparisons used for min/max statistics are not correct
> -------------------------------------------------------------------------
>
> Key: ARROW-6149
> URL: https://issues.apache.org/jira/browse/ARROW-6149
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 0.14.1
> Reporter: Philip Felton
> Priority: Major
> Fix For: 1.0.0
>
>
> The [Parquet Format
> specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
> says
> bq. If the column uses int32 or int64 physical types, then signed comparison
> of the integer values produces the correct ordering. If the physical type is
> fixed, then the correct ordering can be produced by flipping the
> most-significant bit in the first byte and then using unsigned byte-wise
> comparison.
> However this isn't followed in the C++ Parquet code. 16-byte decimal
> comparison is implemented using a lexicographical comparison of signed chars.
> This appears to be because the function
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183]
> just goes off the sort_order (signed) and physical_type
> (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)