[GitHub] [iceberg] rdblue commented on a change in pull request #1237: Update Flink Parquet reader and writer to use schema visitor

GitBox Sun, 26 Jul 2020 12:25:13 -0700


rdblue commented on a change in pull request #1237:
URL: https://github.com/apache/iceberg/pull/1237#discussion_r460563787




##########
File path: 
flink/src/test/java/org/apache/iceberg/flink/data/TestFlinkParquetReaderWriter.java
##########
@@ -41,34 +47,193 @@
   @Rule
   public TemporaryFolder temp = new TemporaryFolder();
 
-  private void testCorrectness(Schema schema, int numRecords, Iterable<Row> 
iterable) throws IOException {
+  private void testCorrectness(Schema schema, int numRecords, 
Iterable<RowData> iterable) throws IOException {
     File testFile = temp.newFile();
     Assert.assertTrue("Delete should succeed", testFile.delete());
 
-    try (FileAppender<Row> writer = Parquet.write(Files.localOutput(testFile))
+    try (FileAppender<RowData> writer = 
Parquet.write(Files.localOutput(testFile))
         .schema(schema)
-        .createWriterFunc(FlinkParquetWriters::buildWriter)
+        .createWriterFunc(msgType -> 
FlinkParquetWriters.buildWriter(FlinkSchemaUtil.convert(schema), msgType))
         .build()) {
       writer.addAll(iterable);
     }
 
-    try (CloseableIterable<Row> reader = 
Parquet.read(Files.localInput(testFile))
+    try (CloseableIterable<RowData> reader = 
Parquet.read(Files.localInput(testFile))
         .project(schema)
         .createReaderFunc(type -> FlinkParquetReaders.buildReader(schema, 
type))
         .build()) {
-      Iterator<Row> expected = iterable.iterator();
-      Iterator<Row> rows = reader.iterator();
+      Iterator<RowData> expected = iterable.iterator();
+      Iterator<RowData> rows = reader.iterator();
       for (int i = 0; i < numRecords; i += 1) {
         Assert.assertTrue("Should have expected number of rows", 
rows.hasNext());
-        Assert.assertEquals(expected.next(), rows.next());
+        assertRowData(schema.asStruct(), expected.next(), rows.next());
       }
       Assert.assertFalse("Should not have extra rows", rows.hasNext());
     }
   }
 
+  private void assertRowData(Type type, RowData expected, RowData actual) {

Review comment:
       Can you move these to a `TestHelpers` class with methods like 
`assertEqualsInternal` for the internal representation?
   
   Also, the tests are missing conversions that discard microseconds because 
this is generating internal representations (like `RowData`) and comparing to 
`RowData`. These tests should write generics and validate a generic record 
against a row. See the `TestHelpers` in Spark.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1237: Update Flink Parquet reader and writer to use schema visitor

Reply via email to