alamb opened a new issue #410:
URL: https://github.com/apache/arrow-datafusion/issues/410


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Sometimes something about how a query is constructed (e.g. via an automated 
tool or rewrite) results in  redundant filters (aka the same exact predicate 
twice). Datafusion could remove that redundancy
   
   Reproducer:
   ```
   echo "1" > /tmp/foo.csv
   cargo run -p datafusion-cli
   ```
   
   And then in the datafusion CLI:
   ```
   CREATE EXTERNAL TABLE foo(ts int)
   STORED AS CSV
   LOCATION '/tmp/foo.csv';
   
   explain verbose select * from foo where ts>5 AND ts>5;
   ```
   
   
   On master you can still see both predicates are still present:
   ```
   |                                         |   FilterExec: CAST(ts AS Int64) 
> 5 AND CAST(ts AS Int64) > 5            |
   ```
   
   Here is the entire plan:
   
   ```
   > explain verbose select * from foo where ts>5 AND ts>5;
   
+-----------------------------------------+--------------------------------------------------------------------------+
   | plan_type                               | plan                             
                                        |
   
+-----------------------------------------+--------------------------------------------------------------------------+
   | logical_plan                            | Projection: #ts                  
                                        |
   |                                         |   Filter: #ts Gt Int64(5) And 
#ts Gt Int64(5)                            |
   |                                         |     TableScan: foo 
projection=None                                       |
   | logical_plan after projection_push_down | Projection: #ts                  
                                        |
   |                                         |   Filter: #ts Gt Int64(5) And 
#ts Gt Int64(5)                            |
   |                                         |     TableScan: foo 
projection=Some([0])                                  |
   | logical_plan after projection_push_down | Projection: #ts                  
                                        |
   |                                         |   Filter: #ts Gt Int64(5) And 
#ts Gt Int64(5)                            |
   |                                         |     TableScan: foo 
projection=Some([0])                                  |
   | physical_plan                           | ProjectionExec: expr=[ts]        
                                        |
   |                                         |   FilterExec: CAST(ts AS Int64) 
> 5 AND CAST(ts AS Int64) > 5            |
   |                                         |     CsvExec: 
source=Path(/tmp/foo.csv: [/tmp/foo.csv]), has_header=false |
   
+-----------------------------------------+--------------------------------------------------------------------------+
   ```
   
   
   
   **Describe the solution you'd like**
   I propose updating the filter pushdown rule we already have to remove 
redundant filters at the same time
   
   **Describe alternatives you've considered**
   We could also potentially add an entirely separate optimizer rule to remove 
the redundant filters
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to