@HuangZhenQiu Here is the schema of that parquet file, printed in Zeppelin. > root > |-- metrics_date: timestamp (nullable = true) > |-- counter: long (nullable = true) > |-- meter: double (nullable = true) > |-- customer_id: string (nullable = true) I also attach that sample file here: [https://github.com/lvhuyen/flink/blob/parquet_input_format(7243)/flink-formats/flink-parquet/src/test/resources/test.parquet](url )
I tried to debug in IntelliJ, that column is in fact stored as primitive type int96 (not 64), and as Apache's [https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderImpl.java](url), int96 is treated as a String (line 274). The way they converted from ByteArray into a String at line 393 of [https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java](url) seems to be irreversible and leads to data loss (my data has metrics_date = 2018-09-01 15:02:55.0, which was read as a bytes array of [0, 118, -95, -103, 69, 49, 0, 0, -5, -126, 37, 0]. After that line 393, I got a string with length = 12 which has the same character at 3, 4, 9, and 10th position. [ Full content available at: https://github.com/apache/flink/pull/6483 ] This message was relayed via gitbox.apache.org for [email protected]
