RussellSpitzer commented on a change in pull request #3533:
URL: https://github.com/apache/iceberg/pull/3533#discussion_r747766166
##########
File path:
spark/v3.2/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkMetadataColumns.java
##########
@@ -158,6 +168,62 @@ public void testSpecAndPartitionMetadataColumns() {
sql("SELECT _spec_id, _partition FROM %s ORDER BY _spec_id",
TABLE_NAME));
}
+ @Test
+ public void testPositionMetadataColumnWithMultipleRowGroups() throws
NoSuchTableException {
+ Assume.assumeTrue(fileFormat == FileFormat.PARQUET);
+
+ table.updateProperties()
+ .set(PARQUET_ROW_GROUP_SIZE_BYTES, "100")
+ .commit();
+
+ List<Long> ids = Lists.newArrayList();
+ for (long id = 0L; id < 200L; id++) {
+ ids.add(id);
+ }
+ Dataset<Row> df = spark.createDataset(ids, Encoders.LONG())
+ .withColumnRenamed("value", "id")
+ .withColumn("category", lit("hr"))
+ .withColumn("data", lit("ABCDEF"));
+ df.coalesce(1).writeTo(TABLE_NAME).append();
+
+ Assert.assertEquals(200, spark.table(TABLE_NAME).count());
+
+ List<Object[]> expectedRows = ids.stream()
+ .map(this::row)
+ .collect(Collectors.toList());
+ assertEquals("Rows must match",
+ expectedRows,
+ sql("SELECT _pos FROM %s", TABLE_NAME));
+ }
+
+ @Test
+ public void testPositionMetadataColumnWithMultipleBatches() throws
NoSuchTableException {
Review comment:
This does make me think we should add a test suite somewhere that just
does "Vectorized vs NonVectorized" and just checks that with a variety of
reads/schemas
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]