Re: [PR] Fix variant type filtering in ParquetMetricsRowGroupFilter [iceberg]

via GitHub Fri, 19 Sep 2025 09:50:11 -0700


amogh-jahagirdar commented on code in PR #14081:
URL: https://github.com/apache/iceberg/pull/14081#discussion_r2363551105



##########
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/SparkTestHelperBase.java:
##########
@@ -79,6 +80,12 @@ protected void assertEquals(String context, Object[] 
expectedRow, Object[] actua
         } else {
           assertEquals(newContext, (Object[]) expectedValue, (Object[]) 
actualValue);
         }
+      } else if (expectedValue instanceof VariantVal && actualValue instanceof 
VariantVal) {
+        // Spark VariantVal comparison is based on raw byte[] comparison, 
which can fail
+        // if Spark uses trailing null bytes. so, we compare their JSON 
representation instead.
+        assertThat((actualValue).toString())
+            .as("%s contents should match (VariantVal JSON)", context)
+            .isEqualTo((expectedValue).toString());

Review Comment:
   Discussed offline, I think it's reasonable to compare the JSON 
representation to determine if the logical variant is equivalent. 



##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -988,6 +1004,67 @@ public void testTransformFilter() {
         .isTrue();
   }
 
+  @TestTemplate
+  public void testVariantFieldMixedValuesNotNull() throws IOException {
+    assumeThat(format).isEqualTo(FileFormat.PARQUET);
+
+    List<GenericRecord> records = Lists.newArrayList();
+    for (int i = 0; i < 10; i++) {
+      GenericRecord record = GenericRecord.create(VARIANT_SCHEMA);
+      record.setField("id", i);
+      if (i % 2 == 0) {
+        VariantMetadata metadata = Variants.metadata("field");
+        ShreddedObject obj = Variants.object(metadata);
+        obj.put("field", Variants.of("value" + i));
+        Variant variant = Variant.of(metadata, obj);
+        record.setField("variant_field", variant);
+      }
+      records.add(record);
+    }
+
+    File parquetFile = writeParquetFile("test-variant", VARIANT_SCHEMA, 
records);
+    InputFile inFile = Files.localInput(parquetFile);
+    try (ParquetFileReader reader = 
ParquetFileReader.open(parquetInputFile(inFile))) {
+      BlockMetaData blockMetaData = reader.getRowGroups().get(0);
+      MessageType fileSchema = reader.getFileMetaData().getSchema();

Review Comment:
   Non-blocking nit: I also feel like we can abstract reading the parquet block 
metadata/schema in a helper too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Fix variant type filtering in ParquetMetricsRowGroupFilter [iceberg]

Reply via email to