[ 
https://issues.apache.org/jira/browse/FLINK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496076#comment-17496076
 ] 

Ryan Skraba commented on FLINK-26277:
-------------------------------------

It might be worthwhile refactoring the implementation for clarity, but it's 
actually correct: {{readDataBuffer}} sets the byte ordering to LITTLE_ENDIAN, 
so that we're reading numbers from the "other side".

As an aside, it seems that Parquet would prefer everyone to use INT64 logical 
types for timestamps, and has deprecated INT96 quite a while ago (PARQUET-323). 
 There is a Jira to update Flink.

 

> Java docs & implementation of TimestampColumnReader are contradicting
> ---------------------------------------------------------------------
>
>                 Key: FLINK-26277
>                 URL: https://issues.apache.org/jira/browse/FLINK-26277
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 1.15.0
>            Reporter: Caizhi Weng
>            Priority: Major
>
> (Not sure if this should be classified as a bug, but I don't see a more 
> proper type.)
> The Java docs of {{TimestampColumnReader}} states that
> {code:java}
> /**
>  * Timestamp {@link ColumnReader}. We only support INT96 bytes now, 
> julianDay(4) + nanosOfDay(8).
>  * See 
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
>  * TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType.
>  */
> {code}
> However the implementation goes like this
> {code:java}
> ByteBuffer buffer = readDataBuffer(12);
> column.setTimestamp(
>         rowId + i,
>         int96ToTimestamp(utcTimestamp, buffer.getLong(), buffer.getInt()));
> {code}
> This implementation contradicts the Java docs because {{nanosOfDay(8)}} 
> actually precedes {{julianDay(4)}}.
> This implementation is also confusing as it relies on the evaluation order of 
> the argument list. Although it is specified in the [Java Language 
> Specification|https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.7.4]
>  that argument lists are evaluated from left to right, it is not true for 
> other languages (for example c++ does not specify this and may evaluate the 
> list in arbitrary order).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to