[ https://issues.apache.org/jira/browse/PARQUET-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated PARQUET-1655: ---------------------------------- Summary: [C++] Decimal comparisons used for min/max statistics are not correct (was: [Parquet] Decimal comparisons used for min/max statistics are not correct) > [C++] Decimal comparisons used for min/max statistics are not correct > --------------------------------------------------------------------- > > Key: PARQUET-1655 > URL: https://issues.apache.org/jira/browse/PARQUET-1655 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp > Reporter: Philip Felton > Priority: Major > > The [Parquet Format > specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] > says > bq. If the column uses int32 or int64 physical types, then signed comparison > of the integer values produces the correct ordering. If the physical type is > fixed, then the correct ordering can be produced by flipping the > most-significant bit in the first byte and then using unsigned byte-wise > comparison. > However this isn't followed in the C++ Parquet code. 16-byte decimal > comparison is implemented using a lexicographical comparison of signed chars. > This appears to be because the function > [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183] > just goes off the sort_order (signed) and physical_type > (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal. -- This message was sent by Atlassian Jira (v8.3.4#803005)