shangxinli commented on a change in pull request #1566:
URL: https://github.com/apache/iceberg/pull/1566#discussion_r506766877
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java
##########
@@ -130,13 +139,23 @@ private void advance() {
PageReadStore pages;
try {
- pages = reader.readNextRowGroup();
+ // Because of the issue of PARQUET-1901, we cannot blindly call
readNextFilteredRowGroup()
+ if (hasRecordFilter) {
+ pages = reader.readNextFilteredRowGroup();
+ } else {
+ pages = reader.readNextRowGroup();
+ }
} catch (IOException e) {
throw new RuntimeIOException(e);
}
+ long blockRowCount = blocks.get(nextRowGroup).getRowCount();
+ Preconditions.checkState(blockRowCount >= pages.getRowCount(),
+ "Number of values in the block, %s, does not great or equal
number of values after filtering, %s",
+ blockRowCount, pages.getRowCount());
long rowPosition = rowGroupsStartRowPos[nextRowGroup];
Review comment:
This is such an important comment! That is absolutely possible and we
need to handle that case. I looked at the code again and found even the
existing code, we are missing a condition check 'pages == null'. It seems valid
that readNextRowGroup() can return null
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L899.
In that case, the caller needs to handle while it doesn't. It is more
complicated for readNextFilteredRowGroup() because it can advance internally
without we know. I will add some handling code there. If you have suggestions,
let me know.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]