[GitHub] [iceberg] rdblue commented on a change in pull request #3148: [ARROW] Vectorized Parquet Reads - Make ArrowReader's iterator idempotent

GitBox Sun, 19 Sep 2021 16:35:23 -0700


rdblue commented on a change in pull request #3148:
URL: https://github.com/apache/iceberg/pull/3148#discussion_r711821316




##########
File path: 
arrow/src/test/java/org/apache/iceberg/arrow/vectorized/ArrowReaderTest.java
##########
@@ -353,6 +371,29 @@ private void readAndCheckArrowResult(
     assertEquals(expectedTotalRows, totalRows);
   }
 
+  private void readAndCheckHasNextIsIdempotent(
+      TableScan scan,
+      int numRowsPerRoot,
+      int expectedTotalRows,
+      int numExtraCallsToHasNext) throws IOException {
+    int totalRows = 0;
+    try (VectorizedTableScanIterable itr = new 
VectorizedTableScanIterable(scan, numRowsPerRoot, false)) {
+      CloseableIterator<ColumnarBatch> iterator = itr.iterator();
+      while (iterator.hasNext()) {
+        // Call hasNext() a few extra times.
+        // This should not affect the total number of rows read.
+        for (int i = 0; i < numExtraCallsToHasNext; i++) {
+          assertTrue(iterator.hasNext());
+        }
+
+        ColumnarBatch batch = iterator.next();
+        VectorSchemaRoot root = batch.createVectorSchemaRootFromVectors();
+        totalRows += root.getRowCount();

Review comment:
       How does this test that the iterator is idempotent? It looks like this 
just tests that the batch size is correct. I think that this should also call 
`checkAllVectorValues` to ensure that the expected rows are the ones produced 
rather than relying on the total number of rows.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #3148: [ARROW] Vectorized Parquet Reads - Make ArrowReader's iterator idempotent

Reply via email to