mateuszkj opened a new issue, #2270:
URL: https://github.com/apache/arrow-datafusion/issues/2270

   **Describe the bug**
   Filters are not push down through `SubqueryAlias` to `TableScan` during 
logical plan optimization. This can cause unnecessary IO during pruning parquet 
files.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   Prepare data and run datafusion-cli with logs:
   ```sh
   echo "1,2" > data.csv
   export RUST_LOG=info,datafusion=debug
   datafusion-cli
   ```
   
   Run query without alias (`partial_filters` is added for `TableScan`):
   ```sql
   ❯ SELECT b FROM foo WHERE a=1;
   [2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Input logical 
plan:
       Projection: #foo.b
         Filter: #foo.a = Int64(1)
           TableScan: foo projection=None
       
   [2022-04-18T21:16:26Z DEBUG datafusion::execution::context] Optimized 
logical plan:
       Projection: #foo.b
         Filter: #foo.a = Int64(1)
           TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = 
Int64(1)]
   ```
   
   Run query with alias (`partial_filters` is not added for `TableScan`)
   ```sql
   ❯ SELECT a.b FROM foo a WHERE a.a = 1;
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical 
plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=None
       
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized 
logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=Some([0, 1])
   ```
   
   
   **Expected behavior**
   `partial_filers` should be push down to `TableScan`
   
   ```sql
   ❯ SELECT a.b FROM foo a WHERE a.a = 1;
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Input logical 
plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=None
       
   [2022-04-18T21:16:38Z DEBUG datafusion::execution::context] Optimized 
logical plan:
       Projection: #a.b
         Filter: #a.a = Int64(1)
           SubqueryAlias: a
             TableScan: foo projection=Some([0, 1]), partial_filters=[#foo.a = 
Int64(1)]
   ```
   
   **Additional context**
   
   Tested with master branch 5f0b61b0db9849336e2e83b23c8a45508a85fb38. I think 
this `SubqueryAlias` condition is not handled in file: 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/src/optimizer/filter_push_down.rs#L299=
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to