rdblue commented on PR #6935:
URL: https://github.com/apache/iceberg/pull/6935#issuecomment-1445436703

   > The readFilteredRowGroup method provided by Parquet will detect whether 
there is a filter pushed down,
   and only return the filtered row-group when there is a push-down filter.
   
   I commented where we set the row ranges for the row group. I think that 
should work with Parquet, but it's been a while since I looked at it. Getting a 
public API call in would make it easier.
   
   > I think RowRanges is also at the row [group 
level](https://github.com/apache/parquet-mr/blob/c9cfe821448a2f99797fda7f46c70a16cc1250a9/parquet-column/src/main/java/org/apache/parquet/internal/filter2/columnindex/RowRanges.java#L33),
 Parquet-mr will 
[generate](https://github.com/apache/parquet-mr/blob/c9cfe821448a2f99797fda7f46c70a16cc1250a9/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L1142%EF%BC%89)
 a RowRanges for each row-group when running the column index filter.
   
   Great! I put this together by trying to reverse engineer what was going on 
in Parquet, so I must have gotten it right.
   
   > We will have to wait for the next version to use, but Parquet-mr may have 
a release in a month, see this comment , we should be able to catch up with 
this release. If you agree, I can open a PR in the Parquet-mr repo.
   
   I'm all for adding what we need to Parquet. We can continue to use 
reflection until it is available.
   
   I think the main thing is that we don't currently handle the row ranges 
after we've skipped reading the pages. So the next steps are to verify what's 
in this PR and then to update the read paths so that values are skipped for the 
skipped rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to