tustvold commented on code in PR #3633:
URL: https://github.com/apache/arrow-rs/pull/3633#discussion_r1090511623
##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -110,8 +111,18 @@ impl RowSelection {
Self::from_consecutive_ranges(iter, total_rows)
}
+ /// Creates a [`RowSelection`] that will select `limit` rows and skip all
remaining rows.
+ pub(crate) fn from_limit(limit: usize, total_rows: usize) -> Self {
+ Self {
+ selectors: vec![
+ RowSelector::select(limit),
+ RowSelector::skip(total_rows.saturating_sub(limit)),
Review Comment:
```suggestion
```
This shouldn't be necessary, it will get removed by `RowSelection::trim`
anyway. Does make me wonder if this method is even really needed :thinking:
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -167,6 +170,17 @@ impl<T> ArrowReaderBuilder<T> {
..self
}
}
+
+ /// Provide a limit to the number of rows to be read
+ ///
+ /// The limit will be used to generate an `RowSelection` so only `limit`
+ /// rows are decoded
Review Comment:
```suggestion
/// The limit will be applied after any [`Self::with_row_selection`] and
[`Self::with_row_filter`]
/// allowing it to limit the final set of rows decoded after any pushed
down predicates
```
##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -371,6 +382,35 @@ impl RowSelection {
self
}
+ /// Limit this [`RowSelection`] to only select `limit` rows
+ pub(crate) fn limit(mut self, mut limit: usize) -> Self {
+ let mut remaining = 0;
Review Comment:
This could be more simply implemented as
```
Self {
selectors: intersect_row_selections(&self.selectors,
&[RowSelector::select(limit)]),
}
```
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -453,6 +467,17 @@ impl<T: ChunkReader + 'static>
ArrowReaderBuilder<SyncReader<T>> {
selection = Some(RowSelection::from(vec![]));
}
+ // If a limit is defined, apply it to the final `RowSelection`
Review Comment:
We need to also do something similar for the async reader
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]