alamb opened a new issue #410:
URL: https://github.com/apache/arrow-datafusion/issues/410
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
Sometimes something about how a query is constructed (e.g. via an automated
tool or rewrite) results in redundant filters (aka the same exact predicate
twice). Datafusion could remove that redundancy
Reproducer:
```
echo "1" > /tmp/foo.csv
cargo run -p datafusion-cli
```
And then in the datafusion CLI:
```
CREATE EXTERNAL TABLE foo(ts int)
STORED AS CSV
LOCATION '/tmp/foo.csv';
explain verbose select * from foo where ts>5 AND ts>5;
```
On master you can still see both predicates are still present:
```
| | FilterExec: CAST(ts AS Int64)
> 5 AND CAST(ts AS Int64) > 5 |
```
Here is the entire plan:
```
> explain verbose select * from foo where ts>5 AND ts>5;
+-----------------------------------------+--------------------------------------------------------------------------+
| plan_type | plan
|
+-----------------------------------------+--------------------------------------------------------------------------+
| logical_plan | Projection: #ts
|
| | Filter: #ts Gt Int64(5) And
#ts Gt Int64(5) |
| | TableScan: foo
projection=None |
| logical_plan after projection_push_down | Projection: #ts
|
| | Filter: #ts Gt Int64(5) And
#ts Gt Int64(5) |
| | TableScan: foo
projection=Some([0]) |
| logical_plan after projection_push_down | Projection: #ts
|
| | Filter: #ts Gt Int64(5) And
#ts Gt Int64(5) |
| | TableScan: foo
projection=Some([0]) |
| physical_plan | ProjectionExec: expr=[ts]
|
| | FilterExec: CAST(ts AS Int64)
> 5 AND CAST(ts AS Int64) > 5 |
| | CsvExec:
source=Path(/tmp/foo.csv: [/tmp/foo.csv]), has_header=false |
+-----------------------------------------+--------------------------------------------------------------------------+
```
**Describe the solution you'd like**
I propose updating the filter pushdown rule we already have to remove
redundant filters at the same time
**Describe alternatives you've considered**
We could also potentially add an entirely separate optimizer rule to remove
the redundant filters
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]