[GitHub] [iceberg] kbendick commented on a change in pull request #3723: Fix Iceberg's parquet writer returning nulls incorrectly for parquet files written by writers that don't use list and element as names.

GitBox Mon, 13 Dec 2021 11:48:48 -0800


kbendick commented on a change in pull request #3723:
URL: https://github.com/apache/iceberg/pull/3723#discussion_r768071019




##########
File path: 
parquet/src/main/java/org/apache/iceberg/parquet/ApplyNameMapping.java
##########
@@ -88,6 +96,31 @@ public Type primitive(PrimitiveType primitive) {
     return field == null ? primitive : primitive.withId(field.id());
   }
 
+  @Override
+  public void beforeElementField(Type element) {
+    super.beforeElementField(makeElement(element));
+  }
+
+  @Override
+  public void afterElementField(Type element) {
+    super.afterElementField(makeElement(element));
+  }
+
+  private Type makeElement(Type element) {
+    // List's element in 3-level lists can be named differently across 
different parquet writers.
+    // For example, hive names it "array_element", whereas new parquet writers 
names it as "element".
+    if (element.getName().equals("element") || element.isPrimitive()) {
+      return element;
+    }

Review comment:
       If it's possible to make this more generic and remove the dependency on 
the name entirely, that would be great.
   
   Also, could you try to make a test that uses the logic in that file to 
design the schema instead of a checked in binary file? I know that might be 
difficult but if you wouldn't mind giving it a try, that would be preferred vs 
a binary file checked in.
   
   As mentioned, here's a similar PR where you can find good test logic for 
making arbitrarily named Parquet message types: 
https://github.com/apache/iceberg/pull/3309




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a change in pull request #3723: Fix Iceberg's parquet writer returning nulls incorrectly for parquet files written by writers that don't use list and element as names.

Reply via email to