ggershinsky commented on a change in pull request #945:
URL: https://github.com/apache/parquet-mr/pull/945#discussion_r825408797
##########
File path:
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java
##########
@@ -265,4 +275,46 @@ public boolean nextKeyValue() throws IOException,
InterruptedException {
return Collections.unmodifiableMap(setMultiMap);
}
+ /**
+ * Returns the row index of the current row. If no row has been processed or
if the
+ * row index information is unavailable from the underlying @{@link
PageReadStore}, returns -1.
+ */
+ public long getCurrentRowIndex() {
+ if (current == 0L || rowIdxInFileItr == null) {
+ return -1;
+ }
+ return currentRowIdx;
+ }
+
+ /**
+ * Resets the row index iterator based on the current processed row group.
+ */
+ private void resetRowIndexIterator(PageReadStore pages) {
+ Optional<Long> rowGroupRowIdxOffset = pages.getRowIndexOffset();
+ currentRowIdx = -1;
+ if (rowGroupRowIdxOffset.isPresent()) {
+ final PrimitiveIterator.OfLong rowIdxInRowGroupItr;
+ if (pages.getRowIndexes().isPresent()) {
+ rowIdxInRowGroupItr = pages.getRowIndexes().get();
+ } else {
+ rowIdxInRowGroupItr = LongStream.range(0,
pages.getRowCount()).iterator();
+ }
+ // Adjust the row group offset in the `rowIndexWithinRowGroupIterator`
iterator.
+ this.rowIdxInFileItr = new PrimitiveIterator.OfLong() {
+ public long nextLong() {
+ return rowGroupRowIdxOffset.get() + rowIdxInRowGroupItr.nextLong();
+ }
+
+ public boolean hasNext() {
+ return rowIdxInRowGroupItr.hasNext();
+ }
+
+ public Long next() {
+ return rowGroupRowIdxOffset.get() + rowIdxInRowGroupItr.next();
+ }
+ };
+ } else {
Review comment:
nit: could you start the method with checking this condition
(!rowGroupRowIdxOffset.isPresent()), and then return? Will look cleaner.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]