zhuqi-lucas commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2478069494


##########
parquet/src/arrow/arrow_reader/read_plan.rs:
##########
@@ -249,3 +254,177 @@ impl ReadPlan {
         self.batch_size
     }
 }
+
+/// Cursor for iterating a [`RowSelection`] during execution within a 
[`ReadPlan`].
+///
+/// This keeps per-reader state such as the current position and delegates the
+/// actual storage strategy to [`RowSelectionBacking`].
+#[derive(Debug)]
+pub struct RowSelectionCursor {
+    /// Backing storage describing how the selection is materialised
+    storage: RowSelectionBacking,
+    /// Current absolute offset into the selection
+    position: usize,
+}
+
+/// Backing storage that powers [`RowSelectionCursor`].
+///
+/// The cursor either walks a boolean mask (dense representation) or a queue
+/// of [`RowSelector`] ranges (sparse representation).
+#[derive(Debug)]
+enum RowSelectionBacking {
+    Mask(BooleanBuffer),
+    Selectors(VecDeque<RowSelector>),
+}
+
+/// Result of computing the next chunk to read when using a bitmap mask
+pub struct MaskChunk {
+    /// Number of leading rows to skip before reaching selected rows
+    pub initial_skip: usize,
+    /// Total rows covered by this chunk (selected + skipped)
+    pub chunk_rows: usize,
+    /// Rows actually selected within the chunk
+    pub selected_rows: usize,
+    /// Starting offset within the mask where the chunk begins
+    pub mask_start: usize,
+}
+
+impl RowSelectionCursor {
+    /// Create a cursor, choosing an efficient backing representation
+    fn new(selectors: Vec<RowSelector>) -> Self {
+        let total_rows: usize = selectors.iter().map(|s| s.row_count).sum();
+        let selector_count = selectors.len();
+        const AVG_SELECTOR_LEN_MASK_THRESHOLD: usize = 8;

Review Comment:
   Nice @hhhizzz , i am wandering if we can change to more stable choice, such 
as statistic based choice, but it's a good start for this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to