sdf-jkl commented on code in PR #9118:
URL: https://github.com/apache/arrow-rs/pull/9118#discussion_r2813959604


##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -800,10 +865,23 @@ impl MaskCursor {
             let mut chunk_rows = 0;
             let mut selected_rows = 0;
 
-            // Advance until enough rows have been selected to satisfy the 
batch size,
-            // or until the mask is exhausted. This mirrors the behaviour of 
the legacy
-            // `RowSelector` queue-based iteration.
-            while cursor < mask.len() && selected_rows < batch_size {
+            let max_chunk_rows = page_boundaries

Review Comment:
   This way we can avoid binary search in every MaskChunk



##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -271,8 +271,64 @@ impl RowSelection {
         })
     }
 
-    /// Returns true if selectors should be forced, preventing mask 
materialisation
-    pub(crate) fn should_force_selectors(
+    /// Returns row offsets for the starts of skipped pages across projected 
columns

Review Comment:
   Function to retrieve row offsets for skipped pages across projected columns.



##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -770,6 +826,9 @@ pub struct MaskCursor {
     mask: BooleanBuffer,
     /// Current absolute offset into the selection
     position: usize,
+    /// Index of the next page boundary candidate. This advances monotonically

Review Comment:
   Make `MaskCursor` know next boundary index like the idea in 
https://github.com/sdf-jkl/arrow-rs/pull/2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to