[GitHub] [iceberg] rdblue commented on a change in pull request #828: iceberg-spark changes for vectorized reads

GitBox Wed, 03 Jun 2020 13:42:37 -0700


rdblue commented on a change in pull request #828:
URL: https://github.com/apache/iceberg/pull/828#discussion_r434840901




##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestHelpers.java
##########
@@ -78,6 +81,48 @@ public static void assertEqualsSafe(Types.StructType struct, 
Record rec, Row row
     }
   }
 
+  public static void assertEqualsUnsafe(Types.StructType struct, List<Record> 
expected, ColumnarBatch batch) {

Review comment:
       This method isn't used, and the main difference between it and 
`assertArrowVectors` is that the other one asserts that each vector is an 
`IcebergArrowColumnVector`, which I don't think is actually necessary. Is there 
a reason to require a specific type instead of just validating the data in each 
row?
   
   Also, since the `ColumnarBatch` provides access to an `InternalRow` that is 
already supported, I was able to use the existing 
`assertEqualsUnsafe(StructType, Record, InternalRow)` in a loop, like this:
   
   ```java
     public static void assertEqualsBatch(Types.StructType struct, List<Record> 
expected, ColumnarBatch batch) {
       for (int r = 0; r < batch.numRows(); r++) {
         assertEqualsUnsafe(struct, expected.get(r), batch.getRow(r));
       }
     }
   ```
   
   That required fixing null handling in `assertEqualsUnsafe`, which wasn't 
calling `isNullAt`, but once that was updated everything works without these 
two fairly large methods. I'd prefer to move to using `assertEqualsBatch` 
instead unless you think that introduces a problem.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #828: iceberg-spark changes for vectorized reads

Reply via email to