alamb commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2483684259


##########
parquet/src/column/reader.rs:
##########
@@ -214,6 +219,49 @@ where
             let remaining_records = max_records - total_records_read;
             let remaining_levels = self.num_buffered_values - 
self.num_decoded_values;
 
+            if self.synthetic_page {

Review Comment:
   I am similarly still confused. 
   
   @hhhizzz your explanation makes sense to me in theory, but I just tested out 
removing the synthetic page code from this PR and the tests all still seem to 
pass. So that means we either have a testing gap or there is something else 
going on:
   - https://github.com/hhhizzz/arrow-rs/pull/5
   
   I looked more carefully, and it seems to me that the calculation of what 
pages to fetch is still based on `RowSelection` (not the `RowSelectionCursor` / 
`RowSelectionBacking`):
   
https://github.com/apache/arrow-rs/blob/cc1444a3232fa11b8485e2794a88f342bd7f97e2/parquet/src/arrow/async_reader/mod.rs#L983-L995
   
   Thus it does feel possible to have the the situation you explain where pages 
needed to evaluate the row selection weren't fetched 🤔 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to