huaxingao commented on code in PR #14279:
URL: https://github.com/apache/iceberg/pull/14279#discussion_r2423248867


##########
data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilter.java:
##########
@@ -1048,19 +1041,40 @@ public void testVariantFieldAllNullsNotNull() throws 
IOException {
       records.add(record);
     }
 
-    File parquetFile = writeParquetFile("test-variant-nulls", VARIANT_SCHEMA, 
records);
-    InputFile inFile = Files.localInput(parquetFile);
+    boolean shouldRead = shouldReadVariant(notNull("variant_field"), records);
 
-    try (ParquetFileReader reader = 
ParquetFileReader.open(parquetInputFile(inFile))) {
-      BlockMetaData blockMetaData = reader.getRowGroups().get(0);
-      MessageType fileSchema = reader.getFileMetaData().getSchema();
-      ParquetMetricsRowGroupFilter rowGroupFilter =
-          new ParquetMetricsRowGroupFilter(VARIANT_SCHEMA, 
notNull("variant_field"), true);
+    assertThat(shouldRead)
+        .as("Should read: variant notNull filters must be evaluated post scan 
even for all nulls")
+        .isTrue();
+  }
 
-      assertThat(rowGroupFilter.shouldRead(fileSchema, blockMetaData))
-          .as("Should read: variant notNull filters must be evaluated post 
scan even for all nulls")
-          .isTrue();
-    }
+  @TestTemplate
+  public void testVariantFieldEq() throws IOException {
+    assumeThat(format).isEqualTo(FileFormat.PARQUET);
+
+    VariantMetadata md = Variants.metadata("k");
+    Variant v0 = createVariantWithKey(md, "v0");
+    List<GenericRecord> records = createVariantRecords(v0);
+
+    boolean shouldRead = shouldReadVariant(equal("variant_field", v0), 
records);
+    assertThat(shouldRead)
+        .as("Should read: variant eq filters must be evaluated post scan")
+        .isTrue();
+  }
+
+  @TestTemplate
+  public void testVariantFieldIn() throws IOException {
+    assumeThat(format).isEqualTo(FileFormat.PARQUET);
+
+    VariantMetadata md = Variants.metadata("k");
+    Variant v0 = createVariantWithKey(md, "v0");
+    Variant v1 = createVariantWithKey(md, "v1");
+    List<GenericRecord> records = createVariantRecords(v0);
+
+    boolean shouldRead = shouldReadVariant(in("variant_field", v0, v1), 
records);
+    assertThat(shouldRead)
+        .as("Should read: variant in filters must be evaluated post scan")

Review Comment:
   Fixed. Thanks!



##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -376,10 +376,10 @@ public <T> Boolean notEq(BoundReference<T> ref, 
Literal<T> lit) {
     public <T> Boolean in(BoundReference<T> ref, Set<T> literalSet) {
       int id = ref.fieldId();
 
-      // When filtering nested types notNull() is implicit filter passed even 
though complex
-      // filters aren't pushed down in Parquet. Leave all nested column type 
filters to be
+      // Leave all nested column type and variant type filters to be
       // evaluated post scan.
-      if (schema.findType(id) instanceof Type.NestedType) {
+      Type type = schema.findType(id);
+      if (type instanceof Type.NestedType || type.isVariantType()) {

Review Comment:
   Thanks for the suggestion! I’ll keep it inline for now since it’s a single 
check. If this logic spreads or becomes more complex, I’ll extract a private 
helper so we can update it in one place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to