[GitHub] [iceberg] kbendick commented on a diff in pull request #4627: Parquet: Fixes get null values for the nested field partition column

GitBox Tue, 26 Apr 2022 21:13:54 -0700


kbendick commented on code in PR #4627:
URL: https://github.com/apache/iceberg/pull/4627#discussion_r859361411



##########
flink/v1.14/flink/src/main/java/org/apache/iceberg/flink/data/FlinkParquetReaders.java:
##########
@@ -116,7 +116,11 @@ public ParquetValueReader<RowData> struct(Types.StructType 
expected, GroupType s
         int id = field.fieldId();
         if (idToConstant.containsKey(id)) {
           // containsKey is used because the constant may be null
-          
reorderedFields.add(ParquetValueReaders.constant(idToConstant.get(id)));
+
+          // We use the max definition level of the parent node to infer the 
max definition level of the constant field
+          // in case of we could not find the given parquet field with 
typesById.
+          int fieldD = type.getMaxDefinitionLevel(currentPath());

Review Comment:
   Based on the comment, we only need `fieldD` if the `typesById` function look 
up fails
   
   But if we look at lines 101-107 of this file, it looks like very similar 
logic already exists. Would it be more appropriate to move this logic up closer 
to lines 101-107, where `readersById` and `typesById` are constructed and 
there's already a call to `type.getMaxDefinitionLevel`?  Perhaps in an `else` 
block, so that the `typesById` mapping is always complete?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on a diff in pull request #4627: Parquet: Fixes get null values for the nested field partition column

Reply via email to