stuartcarnie opened a new issue, #6072:
URL: https://github.com/apache/arrow-datafusion/issues/6072
### Describe the bug
Predicates which combine multiple `true` literals using a disjunction
operator followed by a conjunction are incorrectly simplified, causing
incorrect results.
For example, executing the following query:
```sql
WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE
(time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0');
```
Produces the following, incorrect results, as the row containing `cpu1` for
the column `cpu` should not be included:
```
+------+------+
| time | cpu |
+------+------+
| 0 | cpu0 |
| 1 | cpu1 |
+------+------+
2 rows in set. Query took 0.004 seconds.
```
### To Reproduce
```sql
WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE
(time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0')
```
### Expected behavior
Produce the following results:
```
+------+------+
| time | cpu |
+------+------+
| 0 | cpu0 |
+------+------+
1 row in set. Query took 0.004 seconds.
```
### Additional context
Running `EXPLAIN VERBOSE` indicates the `simplify_expressions` rule is
incorrectly rewriting the filter node:
```text
❯ explain verbose WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1'))
SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu =
'cpu0');
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| initial_logical_plan | Projection:
time, cpu
|
| | Filter:
(time = Int64(0) OR time = Int64(1)) AND (Boolean(true) OR Boolean(true) AND
cpu = Utf8("cpu0")) |
| |
Projection: t.column1 AS time, t.column2 AS cpu
|
| |
SubqueryAlias: t
|
| |
Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1"))
|
| logical_plan after inline_table_scan | SAME TEXT AS
ABOVE
|
| logical_plan after type_coercion | SAME TEXT AS
ABOVE
|
| logical_plan after simplify_expressions | Projection:
time, cpu
|
| | Filter:
time = Int64(0) OR time = Int64(1)
|
| |
Projection: t.column1 AS time, t.column2 AS cpu
|
| |
SubqueryAlias: t
|
| |
Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1"))
|
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]