alamb commented on code in PR #8733:
URL: https://github.com/apache/arrow-rs/pull/8733#discussion_r2510979304
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -212,6 +217,14 @@ impl<T> ArrowReaderBuilder<T> {
}
}
+ /// Configure how row selections should be materialised during execution
Review Comment:
It would be nice, eventually, to change RowSelection so the callers could
provide a bitmap as well.
I will think about how this might look, but it is not something to change in
this PR
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -43,6 +43,7 @@ use crate::file::reader::{ChunkReader, SerializedPageReader};
use crate::schema::types::SchemaDescriptor;
use crate::arrow::arrow_reader::metrics::ArrowReaderMetrics;
+// Exposed so integration tests and benchmarks can temporarily override the
threshold.
Review Comment:
I think there are other use cases too (aka I am not sure this comment is
accurate anymore)
##########
parquet/src/arrow/arrow_reader/mod.rs:
##########
@@ -934,12 +950,16 @@ impl<T: ChunkReader + 'static>
ParquetRecordBatchReaderBuilder<T> {
let array_reader = ArrayReaderBuilder::new(&reader, &metrics)
.build_array_reader(fields.as_deref(), &projection)?;
- let read_plan = plan_builder
+ let mut plan_builder = plan_builder
.limited(reader.num_rows())
.with_offset(offset)
.with_limit(limit)
- .build_limited()
- .build();
+ .build_limited();
+
+ let preferred_strategy = plan_builder.preferred_selection_strategy();
Review Comment:
Elsewhere in the PR you used the term "resolved" to reflect the chosen
strategy -- perhaps we can do the same here
```rust
let resolved_strategy = plan_builder.resolve_selection_strategy();
plan_builder =
plan_builder.with_selection_strategy(resolved_strategy);
```
I also found it slightly strange that the PlanBuilder has both methods --
maybe it would be simpler if build() simply resolved the strategy directly.
However, I think we can do this as a follow on PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]