Caizhi Weng created FLINK-26277:
-----------------------------------
Summary: Java docs & implementation of TimestampColumnReader are
contradicting
Key: FLINK-26277
URL: https://issues.apache.org/jira/browse/FLINK-26277
Project: Flink
Issue Type: Bug
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.15.0
Reporter: Caizhi Weng
(Not sure if this should be classified as a bug, but I don't see a more proper
type.)
The Java docs of {{TimestampColumnReader}} states that
{code:java}
/**
* Timestamp {@link ColumnReader}. We only support INT96 bytes now,
julianDay(4) + nanosOfDay(8).
* See
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
* TIMESTAMP_MILLIS and TIMESTAMP_MICROS are the deprecated ConvertedType.
*/
{code}
However the implementation goes like this
{code:java}
ByteBuffer buffer = readDataBuffer(12);
column.setTimestamp(
rowId + i,
int96ToTimestamp(utcTimestamp, buffer.getLong(), buffer.getInt()));
{code}
This implementation contradicts the Java docs because {{nanosOfDay(8)}}
actually precedes {{julianDay(4)}}.
This implementation is also confusing as it relies on the evaluation order of
the argument list. Although it is specified in the [Java Language
Specification|https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.7.4]
that argument lists are evaluated from left to right, it is not true for other
languages (for example c++ does not specify this and may evaluate the list in
arbitrary order).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)