[ 
https://issues.apache.org/jira/browse/PARQUET-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318529#comment-16318529
 ] 

Lars Volker commented on PARQUET-1065:
--------------------------------------

Thank you [~zi] for the explanation and the example. I got the idea of little 
endian encodings from [here in 
parquet.thrift|https://github.com/apache/parquet-format/blob/a00e770cb301506f6288d11d6532f2635a8cd349/src/main/thrift/parquet.thrift#L400],
 but that refers to the plain encoding.

I'm a bit worried that changing the ordering from "unsigned" to "undefined" 
will not improve the confusion. Impala (and other engines) will still need to 
support reading the values and also may want to write and read statistics. Can 
we consider changing the ordering to something like "comparison of the 
represented value if used for legacy timestamps, undefined otherwise"?

> Deprecate type-defined sort ordering for INT96 type
> ---------------------------------------------------
>
>                 Key: PARQUET-1065
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1065
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Zoltan Ivanfi
>            Assignee: Zoltan Ivanfi
>
> [parquet.thrift in 
> parquet-format|https://github.com/apache/parquet-format/blob/041708da1af52e7cb9288c331b542aa25b68a2b6/src/main/thrift/parquet.thrift#L37]
>  defines the the sort order for INT96 to be signed. 
> [ParquetMetadataConverter.java in 
> parquet-mr|https://github.com/apache/parquet-mr/blob/352b906996f392030bfd53b93e3cf4adb78d1a55/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L422]
>  uses unsigned ordering instead. In practice, INT96 is only used for 
> timestamps and neither signed nor unsigned ordering of the numeric values is 
> correct for this purpose. For this reason, the INT96 sort order should be 
> specified as undefined.
> (As a special case, min == max signifies that all values are the same, and 
> can be considered valid even for undefined orderings.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to