@HuangZhenQiu 
Here is the schema of that parquet file, printed in Zeppelin.
> root
>  |-- metrics_date: timestamp (nullable = true)
>  |-- counter: long (nullable = true)
>  |-- meter: double (nullable = true)
>  |-- customer_id: string (nullable = true)
I also attach that sample file here: 
[https://github.com/lvhuyen/flink/blob/parquet_input_format(7243)/flink-formats/flink-parquet/src/test/resources/test.parquet](url
)

I tried to debug in IntelliJ, that column is in fact stored as primitive type 
int96 (not 64), and as Apache's 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/impl/ColumnReaderImpl.java](url),
 int96 is treated as a String (line 274). The way they converted from ByteArray 
into a String at line 393 of 
[https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/io/api/Binary.java](url)
 seems to be irreversible and leads to data loss (my data has metrics_date = 
2018-09-01 15:02:55.0, which was read as a bytes array of [0, 118, -95, -103, 
69, 49, 0, 0, -5, -126, 37, 0]. After that line 393, I got a string with length 
= 12 which has the same character at 3, 4, 9, and 10th position. 

[ Full content available at: https://github.com/apache/flink/pull/6483 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to