Jeffrey Charles created FLINK-21350:
---------------------------------------

             Summary: ParquetInputFormat incorrectly interprets timestamps 
encoded in microseconds as timestamps encoded in milliseconds
                 Key: FLINK-21350
                 URL: https://issues.apache.org/jira/browse/FLINK-21350
             Project: Flink
          Issue Type: Bug
          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
    Affects Versions: 1.12.1, 1.12.0
            Reporter: Jeffrey Charles


Given a parquet file with a schema that has a field with a physical type of 
INT64 and a logical type of TIMESTAMP_MICROS, all of the ParquetInputFormat 
sub-classes deserialize the timestamp as tens of thousands of years in the 
future.

Looking at the code in 
[https://github.com/apache/flink/blob/release-1.12.1/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/RowConverter.java#L326,]
 it looks to me like the row converter is interpreting the field value as if it 
contained milliseconds and not microseconds. Specifically both millisecond and 
microsecond processing share the same code path to instantiate a 
java.sql.timestamp which takes a millisecond value in its constructor and the 
microsecond case statement is passing it a value in microseconds. I tested a 
change locally where I divide the value by 1000 in the microseconds case 
statement and that results in a timestamp with the expected value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to