alamb commented on code in PR #18868:
URL: https://github.com/apache/datafusion/pull/18868#discussion_r2600089947
##########
datafusion/optimizer/src/push_down_limit.rs:
##########
@@ -32,12 +33,17 @@ use datafusion_expr::{lit, FetchType, SkipType};
/// Optimization rule that tries to push down `LIMIT`.
//. It will push down through projection, limits (taking the smaller limit)
#[derive(Default, Debug)]
-pub struct PushDownLimit {}
+pub struct PushDownLimit {
+ /// Flag to track whether we're currently under a Sort node that requires
order preservation
Review Comment:
I think there is only one `PushDownLimit` instance per `SessionContext`, so
using `AtomicBool` here i think means that planning multiple concurrent queries
will interfere with each other (can set this bool back / forth)
I looked around and it seems like all the other logical optimizer rules
don't push state down either so I don't have a great explanation for how to
change this
##########
datafusion/core/tests/parquet/mod.rs:
##########
@@ -140,9 +140,10 @@ impl TestOutput {
})
}
- fn pruning_metric(&self, metric_name: &str) -> Option<(usize, usize)> {
+ fn pruning_metric(&self, metric_name: &str) -> Option<(usize, usize,
usize)> {
Review Comment:
I think it would help to document what these returned values were.
Or maybe return a struct like
```rust
struct PruningMetric {
total_pruned: usize,
total_matched: usize,
total_fully_matched: usize,
}
```
##########
datafusion/datasource-parquet/src/row_group_filter.rs:
##########
@@ -46,13 +48,19 @@ use parquet::{
pub struct RowGroupAccessPlanFilter {
/// which row groups should be accessed
access_plan: ParquetAccessPlan,
+ /// which row groups are fully contained within the pruning predicate
+ is_fully_matched: Vec<bool>,
Review Comment:
Sorry I meant add a comment like
```rust
/// row groups where ALL rows are known to match the pruning predicate
```
##########
datafusion/optimizer/src/push_down_limit.rs:
##########
@@ -32,12 +33,17 @@ use datafusion_expr::{lit, FetchType, SkipType};
/// Optimization rule that tries to push down `LIMIT`.
//. It will push down through projection, limits (taking the smaller limit)
#[derive(Default, Debug)]
-pub struct PushDownLimit {}
+pub struct PushDownLimit {
+ /// Flag to track whether we're currently under a Sort node that requires
order preservation
Review Comment:
Looking around more, it seems like maybe @zhuqi-lucas 's new physical
optimizer rule in
- https://github.com/apache/datafusion/pull/19064
might be a more straightforward way to go
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]