[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3533: Arrow: Fix vectorized position reader

GitBox Thu, 11 Nov 2021 11:35:07 -0800


RussellSpitzer commented on a change in pull request #3533:
URL: https://github.com/apache/iceberg/pull/3533#discussion_r747766166




##########
File path: 
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkMetadataColumns.java
##########
@@ -158,6 +168,62 @@ public void testSpecAndPartitionMetadataColumns() {
         sql("SELECT _spec_id, _partition FROM %s ORDER BY _spec_id", 
TABLE_NAME));
   }
 
+  @Test
+  public void testPositionMetadataColumnWithMultipleRowGroups() throws 
NoSuchTableException {
+    Assume.assumeTrue(fileFormat == FileFormat.PARQUET);
+
+    table.updateProperties()
+        .set(PARQUET_ROW_GROUP_SIZE_BYTES, "100")
+        .commit();
+
+    List<Long> ids = Lists.newArrayList();
+    for (long id = 0L; id < 200L; id++) {
+      ids.add(id);
+    }
+    Dataset<Row> df = spark.createDataset(ids, Encoders.LONG())
+        .withColumnRenamed("value", "id")
+        .withColumn("category", lit("hr"))
+        .withColumn("data", lit("ABCDEF"));
+    df.coalesce(1).writeTo(TABLE_NAME).append();
+
+    Assert.assertEquals(200, spark.table(TABLE_NAME).count());
+
+    List<Object[]> expectedRows = ids.stream()
+        .map(this::row)
+        .collect(Collectors.toList());
+    assertEquals("Rows must match",
+        expectedRows,
+        sql("SELECT _pos FROM %s", TABLE_NAME));
+  }
+
+  @Test
+  public void testPositionMetadataColumnWithMultipleBatches() throws 
NoSuchTableException {

Review comment:
       This does make me think we should add a test suite somewhere that just 
does "Vectorized vs NonVectorized" and just checks that with a variety of  
reads/schemas




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3533: Arrow: Fix vectorized position reader

Reply via email to