kbendick commented on a change in pull request #3723:
URL: https://github.com/apache/iceberg/pull/3723#discussion_r768071019
##########
File path:
parquet/src/main/java/org/apache/iceberg/parquet/ApplyNameMapping.java
##########
@@ -88,6 +96,31 @@ public Type primitive(PrimitiveType primitive) {
return field == null ? primitive : primitive.withId(field.id());
}
+ @Override
+ public void beforeElementField(Type element) {
+ super.beforeElementField(makeElement(element));
+ }
+
+ @Override
+ public void afterElementField(Type element) {
+ super.afterElementField(makeElement(element));
+ }
+
+ private Type makeElement(Type element) {
+ // List's element in 3-level lists can be named differently across
different parquet writers.
+ // For example, hive names it "array_element", whereas new parquet writers
names it as "element".
+ if (element.getName().equals("element") || element.isPrimitive()) {
+ return element;
+ }
Review comment:
If it's possible to make this more generic and remove the dependency on
the name entirely, that would be great.
Also, could you try to make a test that uses the logic in that file to
design the schema instead of a checked in binary file? I know that might be
difficult but if you wouldn't mind giving it a try, that would be preferred vs
a binary file checked in.
As mentioned, here's a similar PR where you can find good test logic for
making arbitrarily named Parquet message types:
https://github.com/apache/iceberg/pull/3309
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]