[
https://issues.apache.org/jira/browse/PARQUET-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205986#comment-16205986
]
Zoltan Ivanfi commented on PARQUET-1065:
----------------------------------------
Unfortunately, since INT96 timestamps are stored in little endian order, the
first byte will store the least significant byte of the timestamp and not the
most significant one. For this reason, the value of the first byte will wildly
vary, spanning the whole range between 0x00 and 0xFF. As a result, when
comparing the raw bytes, signed and unsigned comparison can lead to different
results.
> Deprecate type-defined sort ordering for INT96 type
> ---------------------------------------------------
>
> Key: PARQUET-1065
> URL: https://issues.apache.org/jira/browse/PARQUET-1065
> Project: Parquet
> Issue Type: Bug
> Reporter: Zoltan Ivanfi
> Assignee: Zoltan Ivanfi
>
> [parquet.thrift in
> parquet-format|https://github.com/apache/parquet-format/blob/041708da1af52e7cb9288c331b542aa25b68a2b6/src/main/thrift/parquet.thrift#L37]
> defines the the sort order for INT96 to be signed.
> [ParquetMetadataConverter.java in
> parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L422]
> uses unsigned ordering instead. In practice, INT96 is only used for
> timestamps and neither signed nor unsigned ordering of the numeric values is
> correct for this purpose. For this reason, the INT96 sort order should be
> specified as undefined.
> (As a special case, min == max signifies that all values are the same, and
> can be considered valid even for undefined orderings.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)