kosiew commented on code in PR #19678:
URL: https://github.com/apache/datafusion/pull/19678#discussion_r2681365222
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -411,9 +412,13 @@ fn pushdown_columns(
let allow_list_columns = supports_list_predicates(expr);
let mut checker = PushdownChecker::new(file_schema, allow_list_columns);
expr.visit(&mut checker)?;
- Ok((!checker.prevents_pushdown()).then_some(PushdownColumns {
- required_columns: checker.required_columns,
- nested: checker.nested_behavior,
+ let prevents_pushdown = checker.prevents_pushdown();
+ let nested = checker.nested_behavior;
+ let mut required_columns = checker.required_columns;
+ required_columns.sort_unstable();
Review Comment:
how about adding:
```rust
impl PushdownChecker {
fn into_sorted_columns(mut self) -> PushdownColumns {
self.required_columns.sort_unstable();
self.required_columns.dedup(); // this removes the need for contains
check
PushdownColumns {
required_columns: self.required_columns,
nested: self.nested_behavior,
}
}
```
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -390,9 +392,8 @@ enum NestedColumnSupport {
Unsupported,
}
-#[derive(Debug)]
Review Comment:
Is there a reason to remove
#[derive(Debug)]
?
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -307,7 +307,9 @@ impl<'schema> PushdownChecker<'schema> {
}
};
- self.required_columns.insert(idx);
+ if !self.required_columns.contains(&idx) {
+ self.required_columns.push(idx);
+ }
Review Comment:
I think it would help future contributors to
- Add comment explaining linear search is acceptable for small `n`
- OR switch to `HashSet` for O(1) deduplication if `n` might grow
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -307,7 +307,9 @@ impl<'schema> PushdownChecker<'schema> {
}
};
- self.required_columns.insert(idx);
+ if !self.required_columns.contains(&idx) {
+ self.required_columns.push(idx);
+ }
Review Comment:
see suggested
fn into_sorted_columns
below
##########
datafusion/datasource-parquet/src/row_filter.rs:
##########
@@ -411,9 +412,13 @@ fn pushdown_columns(
let allow_list_columns = supports_list_predicates(expr);
let mut checker = PushdownChecker::new(file_schema, allow_list_columns);
expr.visit(&mut checker)?;
- Ok((!checker.prevents_pushdown()).then_some(PushdownColumns {
- required_columns: checker.required_columns,
- nested: checker.nested_behavior,
+ let prevents_pushdown = checker.prevents_pushdown();
+ let nested = checker.nested_behavior;
+ let mut required_columns = checker.required_columns;
+ required_columns.sort_unstable();
+ Ok((!prevents_pushdown).then_some(PushdownColumns {
+ required_columns,
+ nested,
}))
Review Comment:
and then rewriting this to:
```rust
Ok((!checker.prevents_pushdown())
.then_some(checker.into_sorted_columns()))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]