[ 
https://issues.apache.org/jira/browse/PARQUET-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated PARQUET-1655:
----------------------------------
    Summary: [C++] Decimal comparisons used for min/max statistics are not 
correct  (was: [Parquet] Decimal comparisons used for min/max statistics are 
not correct)

> [C++] Decimal comparisons used for min/max statistics are not correct
> ---------------------------------------------------------------------
>
>                 Key: PARQUET-1655
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1655
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-cpp
>            Reporter: Philip Felton
>            Priority: Major
>
> The [Parquet Format 
> specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md]
>  says
> bq. If the column uses int32 or int64 physical types, then signed comparison 
> of the integer values produces the correct ordering. If the physical type is 
> fixed, then the correct ordering can be produced by flipping the 
> most-significant bit in the first byte and then using unsigned byte-wise 
> comparison.
> However this isn't followed in the C++ Parquet code. 16-byte decimal 
> comparison is implemented using a lexicographical comparison of signed chars.
> This appears to be because the function 
> [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183]
>  just goes off the sort_order (signed) and physical_type 
> (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to