zhuqi-lucas commented on code in PR #7537:
URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2102118063


##########
parquet/src/arrow/arrow_reader/read_plan.rs:
##########
@@ -231,14 +306,20 @@ impl LimitedReadPlanBuilder {
 pub(crate) struct ReadPlan {
     /// The number of rows to read in each batch
     batch_size: usize,
-    /// Row ranges to be selected from the data source
-    selection: Option<VecDeque<RowSelector>>,
+    /// In what pattern to decode the rows from the Parquet file
+    ///
+    /// This is a queue of [`RowSelector`]s that are guaranteed:
+    /// 1. To have no empty selections (that select no rows)
+    /// 2. fall on a batch_size boundary (e.g. 0, 100, 200, 300)
+    ///
+    /// TODO change this structure to an enum with emit + mask

Review Comment:
   Thank you @alamb , this is great idea, it means we can build the 
range/bitmap at the build time, and also the adaptive policy can applied here. 



##########
parquet/src/arrow/arrow_reader/read_plan.rs:
##########
@@ -247,3 +328,23 @@ impl ReadPlan {
         self.batch_size
     }
 }
+
+/// How to select the next batch of rows to read from the Parquet file
+///
+/// This allows the reader to dynamically choose between decoding strategies
+pub(crate) enum RowsPlan {

Review Comment:
   Beautiful enum, it will include all the cases!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to