[GitHub] [iceberg] rdblue commented on a change in pull request #1716: Read all row group metadata only when position column is projected

GitBox Fri, 06 Nov 2020 10:28:16 -0800


rdblue commented on a change in pull request #1716:
URL: https://github.com/apache/iceberg/pull/1716#discussion_r518929089




##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ReadConf.java
##########
@@ -89,8 +91,11 @@
     this.shouldSkip = new boolean[rowGroups.size()];
 
     // Fetch all row groups starting positions to compute the row offsets of 
the filtered row groups
-    Map<Long, Long> offsetToStartPos = generateOffsetToStartPos();
-    this.startRowPositions = new long[rowGroups.size()];
+    Map<Long, Long> offsetToStartPos = null;
+    if (expectedSchema.findField(MetadataColumns.ROW_POSITION.fieldId()) != 
null) {
+      offsetToStartPos = generateOffsetToStartPos();
+      this.startRowPositions = new long[rowGroups.size()];

Review comment:
       I think this should be defined for all cases so that `ParquetReader` 
doesn't need to check whether it is defined. In that case, it should still be a 
`final` variable.
   
   The only change for this PR should be whether `offsetToStartPos` is an empty 
map or the result of `generateOffsetToStartPos()`. We may also want to do the 
`ROW_POSITION` check in that method instead of here, depending on what is 
cleaner.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a change in pull request #1716: Read all row group metadata only when position column is projected

Reply via email to