[
https://issues.apache.org/jira/browse/PARQUET-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou resolved PARQUET-1655.
-------------------------------------
Fix Version/s: cpp-1.6.0
Resolution: Fixed
Issue resolved by pull request 9582
[https://github.com/apache/arrow/pull/9582]
> [C++] Decimal comparisons used for min/max statistics are not correct
> ---------------------------------------------------------------------
>
> Key: PARQUET-1655
> URL: https://issues.apache.org/jira/browse/PARQUET-1655
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Philip Felton
> Assignee: Micah Kornfield
> Priority: Major
> Labels: pull-request-available
> Fix For: cpp-1.6.0
>
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> The [Parquet Format
> specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
> says
> bq. If the column uses int32 or int64 physical types, then signed comparison
> of the integer values produces the correct ordering. If the physical type is
> fixed, then the correct ordering can be produced by flipping the
> most-significant bit in the first byte and then using unsigned byte-wise
> comparison.
> However this isn't followed in the C++ Parquet code. 16-byte decimal
> comparison is implemented using a lexicographical comparison of signed chars.
> This appears to be because the function
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183]
> just goes off the sort_order (signed) and physical_type
> (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)