[GitHub] [iceberg] kingeasternsun commented on a change in pull request #3987: :bug: fix Flink Read support for parquet int96 timestamps

GitBox Wed, 09 Feb 2022 17:59:18 -0800


kingeasternsun commented on a change in pull request #3987:
URL: https://github.com/apache/iceberg/pull/3987#discussion_r802250045




##########
File path: 
flink/v1.13/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java
##########
@@ -321,6 +327,29 @@ public DecimalData read(DecimalData ignored) {
     }
   }
 
+  private static class TimestampInt96Reader extends 
ParquetValueReaders.UnboxedReader<Long> {
+    private static final long UNIX_EPOCH_JULIAN = 2_440_588L;
+
+    TimestampInt96Reader(ColumnDescriptor desc) {
+      super(desc);
+    }
+
+    @Override
+    public Long read(Long ignored) {
+      return readLong();
+    }
+
+    @Override
+    public long readLong() {

Review comment:
       As for the test , I have three ideas:
   
   ## 1
   because in file  
`flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java`
   has one test case `testInt96TimestampProducedBySparkIsReadCorrectly`,   we 
could  add function to use filnkParquetReader like below: 
   ```java
     protected List<RowData> rowDatasFromFile(InputFile inputFile, Schema 
schema) throws IOException {
       try (CloseableIterable<RowData> reader =
                    Parquet.read(inputFile)
                            .project(schema)
                            .createReaderFunc(type -> 
FlinkParquetReaders.buildReader(schema, type))
                            .build()) {
         return Lists.newArrayList(reader);
       }
     }
   ```
   
   and add this check code
   ```java
       List<RowData> readDataRows = rowDatasFromFile(parquetInputFile, schema);
       Assert.assertEquals(rows.size, readRows.size());
   ```
   after the spark `rowsFromFile` check .
   
   ## 2 
   
   in file  
`flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReader.java`
  ,we could add new test case 
`testInt96TimestampProducedBySparkIsReadCorrectly` that generate parquet file 
with int96format and use FlinkParquetReader to read .
   But  new problem arised, how to generate that parquet file with int96format?
   - If we use `NativeSparkWriterBuilder ` as  `TestFlinkParquetReader.java` 
does,  `spark-catalyst` has to imported , which sounds wired  that 
iceberg-flink module import spark-catalyst.
   - or If there any better api to generate that file?
   
   ## 3
    just manully generate a parquet file ,keep it in test directory, just for 
test?
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kingeasternsun commented on a change in pull request #3987: :bug: fix Flink Read support for parquet int96 timestamps

Reply via email to