mateuszkj opened a new issue, #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270
**Describe the bug**
Filters are not push down through `SubqueryAlias` to `TableScan` during
logical plan optimization. This can cause unnecessary IO during pruning parquet
files.
**To Reproduce**
Steps to reproduce the behavior:
Prepare data and run datafusion-cli with logs:
```sh
echo "1,2" > data.csv
export RUST_LOG=info,datafusion=debug
datafusion-cli
```
Run query without alias (`partial_filters` is added for `TableScan`):
```sql
❯ SELECT b FROM foo WHERE a=1;
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical
plan:
Projection: #foo.b
Filter: #foo.a = Int64(1)
TableScan: foo projection=None
[2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized
logical plan:
Projection: #foo.b
Filter: #foo.a = Int64(1)
TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a =
Int64(1)]
```
Run query with alias (`partial_filters` is not added for `TableScan`)
```sql
❯ SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical
plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=None
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized
logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=Some([0, 1])
```
**Expected behavior**
`partial_filers` should be push down to `TableScan`
```sql
❯ SELECT a.b FROM foo a WHERE a.a = 1;
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical
plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=None
[2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized
logical plan:
Projection: #a.b
Filter: #a.a = Int64(1)
SubqueryAlias: a
TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a =
Int64(1)]
```
**Additional context**
Tested with master branch 5f0b61b0db9849336e2e83b23c8a45508a85fb38. I think
this `SubqueryAlias` condition is not handled in file:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]