SinghAsDev commented on a change in pull request #3723:
URL: https://github.com/apache/iceberg/pull/3723#discussion_r768113767
##########
File path:
parquet/src/main/java/org/apache/iceberg/parquet/ApplyNameMapping.java
##########
@@ -88,6 +96,31 @@ public Type primitive(PrimitiveType primitive) {
return field == null ? primitive : primitive.withId(field.id());
}
+ @Override
+ public void beforeElementField(Type element) {
+ super.beforeElementField(makeElement(element));
+ }
+
+ @Override
+ public void afterElementField(Type element) {
+ super.afterElementField(makeElement(element));
+ }
+
+ private Type makeElement(Type element) {
+ // List's element in 3-level lists can be named differently across
different parquet writers.
+ // For example, hive names it "array_element", whereas new parquet writers
names it as "element".
+ if (element.getName().equals("element") || element.isPrimitive()) {
+ return element;
+ }
Review comment:
I tried a bit to remove the usage of "element", however there isn't a
clean way to do so. The only way I could come up with is build a dummy list and
then get the element out of it. I am not sure if that is any cleaner than the
current approach. Below is the change I am referring to.
```
private Type makeElement(Type element) {
// List's element in 3-level lists can be named differently across
different parquet writers.
// For example, hive names it "array_element", whereas new parquet
writers names it as "element".
if (element.isPrimitive()) {
return element;
}
Types.BaseListBuilder.GroupElementBuilder<GroupType,
Types.ListBuilder<GroupType>> dummyBuilder =
Types.list(Type.Repetition.OPTIONAL)
.groupElement(element.getRepetition())
.addFields(element.asGroupType().getFields().toArray(new
Type[0]));
if (element.getId() != null) {
dummyBuilder.id(element.getId().intValue());
}
return dummyBuilder.named("dummy").getType(0).asGroupType().getType(0);
}
```
As such, I don't have a strong preference on it. @kbendick let me know if
you like this one better and I will make the change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]