amogh-jahagirdar commented on code in PR #7279:
URL: https://github.com/apache/iceberg/pull/7279#discussion_r1161764815
##########
parquet/src/main/java/org/apache/iceberg/parquet/VectorizedParquetReader.java:
##########
@@ -154,21 +177,47 @@ public T next() {
}
private void advance() {
- while (shouldSkip[nextRowGroup]) {
- nextRowGroup += 1;
- reader.skipNextRowGroup();
- }
- PageReadStore pages;
try {
- pages = reader.readNextRowGroup();
- } catch (IOException e) {
- throw new RuntimeIOException(e);
+ Preconditions.checkNotNull(prefetchRowGroupFuture, "future should not
be null");
+ PageReadStore pages = prefetchRowGroupFuture.get();
+
+ if (prefetchedRowGroup >= totalRowGroups) {
+ return;
+ }
+ Preconditions.checkState(
+ pages != null,
+ "advance() should have been only when there was at least one row
group to read");
+ long rowPosition = rowGroupsStartRowPos[prefetchedRowGroup];
+ model.setRowGroupInfo(pages,
columnChunkMetadata.get(prefetchedRowGroup), rowPosition);
+ nextRowGroupStart += pages.getRowCount();
+ prefetchedRowGroup += 1;
+ prefetchNextRowGroup(); // eagerly fetch the next row group
Review Comment:
I'm in favor of having the "eagerness" of the prefetch be configurable so
long as we ensure a sane default. We can measure what an appropriate value
should be for this but it should balance between memory consumption and
performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]