alamb commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2483684259
##########
parquet/src/column/reader.rs:
##########
@@ -214,6 +219,49 @@ where
let remaining_records = max_records - total_records_read;
let remaining_levels = self.num_buffered_values -
self.num_decoded_values;
+ if self.synthetic_page {
Review Comment:
I am similarly still confused.
@hhhizzz your explanation makes sense to me in theory, but I just tested out
removing the synthetic page code from this PR and the tests all still seem to
pass. So that means we either have a testing gap or there is something else
going on:
- https://github.com/hhhizzz/arrow-rs/pull/5
I looked more carefully, and it seems to me that the calculation of what
pages to fetch is still based on `RowSelection` (not the `RowSelectionCursor` /
`RowSelectionBacking`):
https://github.com/apache/arrow-rs/blob/cc1444a3232fa11b8485e2794a88f342bd7f97e2/parquet/src/arrow/async_reader/mod.rs#L983-L995
Thus it does feel possible to have the the situation you explain where pages
needed to evaluate the row selection weren't fetched 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]