wgtmac commented on code in PR #1038: URL: https://github.com/apache/parquet-mr/pull/1038#discussion_r1123155108
########## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java: ########## @@ -1011,6 +1012,35 @@ public PageReadStore readFilteredRowGroup(int blockIndex) throws IOException { } RowRanges rowRanges = getRowRanges(blockIndex); + return readFilteredRowGroup(blockIndex, rowRanges); + } + + /** + * Reads all the columns requested from the specified row group. It may skip specific pages based on the + * {@code rowRanges} passed in. As the rows are not aligned among the pages of the different columns row + * synchronization might be required. See the documentation of the class SynchronizingColumnReader for details. + * + * @param blockIndex the index of the requested block + * @param rowRanges the row ranges to be read from the requested block + * @return the PageReadStore which can provide PageReaders for each column or null if there are no rows in this block + * @throws IOException if an error occurs while reading + * @throws IllegalArgumentException if the {@code blockIndex} is invalid or the {@code rowRanges} is null + */ + public ColumnChunkPageReadStore readFilteredRowGroup(int blockIndex, RowRanges rowRanges) throws IOException { + if (blockIndex < 0 || blockIndex >= blocks.size()) { + throw new IllegalArgumentException(String.format("Invalid block index %s, the valid block index range are: " + + "[%s, %s]", blockIndex, 0, blocks.size() - 1)); + } + + if (Objects.isNull(rowRanges)) { + throw new IllegalArgumentException("RowRanges must not be null"); + } + + BlockMetaData block = blocks.get(blockIndex); + if (block.getRowCount() == 0L) { + throw new ParquetEmptyBlockException("Illegal row group of 0 rows"); Review Comment: That makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org