[
https://issues.apache.org/jira/browse/FLINK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496076#comment-17496076
]
Ryan Skraba commented on FLINK-26277:
-------------------------------------
It might be worthwhile refactoring the implementation for clarity, but it's
actually correct: {{readDataBuffer}} sets the byte ordering to LITTLE_ENDIAN,
so that we're reading numbers from the "other side".
As an aside, it seems that Parquet would prefer everyone to use INT64 logical
types for timestamps, and has deprecated INT96 quite a while ago (PARQUET-323).
There is a Jira to update Flink.
> Java docs & implementation of TimestampColumnReader are contradicting
> ---------------------------------------------------------------------
>
> Key: FLINK-26277
> URL: https://issues.apache.org/jira/browse/FLINK-26277
> Project: Flink
> Issue Type: Bug
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 1.15.0
> Reporter: Caizhi Weng
> Priority: Major
>
> (Not sure if this should be classified as a bug, but I don't see a more
> proper type.)
> The Java docs of {{TimestampColumnReader}} states that
> {code:java}
> /**
> * Timestamp {@link ColumnReader}. We only support INT96 bytes now,
> julianDay(4) + nanosOfDay(8).
> * See
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
> * TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType.
> */
> {code}
> However the implementation goes like this
> {code:java}
> ByteBuffer buffer = readDataBuffer(12);
> column.setTimestamp(
> rowId + i,
> int96ToTimestamp(utcTimestamp, buffer.getLong(), buffer.getInt()));
> {code}
> This implementation contradicts the Java docs because {{nanosOfDay(8)}}
> actually precedes {{julianDay(4)}}.
> This implementation is also confusing as it relies on the evaluation order of
> the argument list. Although it is specified in the [Java Language
> Specification|https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.7.4]
> that argument lists are evaluated from left to right, it is not true for
> other languages (for example c++ does not specify this and may evaluate the
> list in arbitrary order).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)