amogh-jahagirdar commented on code in PR #14261:
URL: https://github.com/apache/iceberg/pull/14261#discussion_r2412056958


##########
parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java:
##########
@@ -270,4 +273,64 @@ public void testStructElementName() {
     MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, 
projection);
     assertThat(actual).as("Pruned schema should be 
matched").isEqualTo(expected);
   }
+
+  @Test
+  public void testVariant() {
+    MessageType fileSchema =
+        Types.buildMessage()
+            .addField(
+                Types.primitive(PrimitiveTypeName.INT32, 
Type.Repetition.REQUIRED)
+                    .id(1)
+                    .named("id"))
+            .addField(
+                Types.buildGroup(Type.Repetition.OPTIONAL)
+                    .as(LogicalTypeAnnotation.variantType((byte) 1))
+                    .addField(
+                        Types.primitive(PrimitiveTypeName.BINARY, 
Type.Repetition.REQUIRED)
+                            .named("metadata"))
+                    .addField(
+                        Types.primitive(PrimitiveTypeName.BINARY, 
Type.Repetition.REQUIRED)
+                            .named("value"))
+                    .id(2)
+                    .named("variant_1"))
+            .addField(
+                Types.buildGroup(Type.Repetition.OPTIONAL)
+                    .as(LogicalTypeAnnotation.variantType((byte) 1))
+                    .addField(
+                        Types.primitive(PrimitiveTypeName.BINARY, 
Type.Repetition.REQUIRED)
+                            .named("metadata"))
+                    .addField(
+                        Types.primitive(PrimitiveTypeName.BINARY, 
Type.Repetition.REQUIRED)
+                            .named("value"))
+                    .id(3)
+                    .named("variant_2"))

Review Comment:
   Nit: This test is a bit hard to parse due to all the fields that need to be 
setup for a variant, could we abstract that behind a buildVariant helper



##########
parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java:
##########
@@ -270,4 +273,64 @@ public void testStructElementName() {
     MessageType actual = ParquetSchemaUtil.pruneColumns(fileSchema, 
projection);
     assertThat(actual).as("Pruned schema should be 
matched").isEqualTo(expected);
   }
+
+  @Test
+  public void testVariant() {

Review Comment:
   Ideally we would additionally have an engine round trip test which can also 
more easily demonstrate the specific case where this issue surfaces, I see 
@huaxingao took this up in https://github.com/apache/iceberg/pull/14276.  



##########
parquet/src/main/java/org/apache/iceberg/parquet/TypeWithSchemaVisitor.java:
##########
@@ -64,7 +64,7 @@ public static <T> T visit(
       } else if (annotation instanceof 
LogicalTypeAnnotation.VariantLogicalTypeAnnotation
           || (iType != null && iType.isVariantType())) {
         // when Parquet has a VARIANT logical type, use it here
-        return visitVariant(iType.asVariantType(), group, visitor);
+        return visitVariant(iType != null ? iType.asVariantType() : null, 
group, visitor);

Review Comment:
   So this NPE issue would happen when we prune out a variant column that we 
know we don't need to read; and in that case the iceberg type passed to the 
visitor would expectedly be null. Makes sense, it's really no different than 
the other cases when pruning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to