alamb opened a new issue, #7949:
URL: https://github.com/apache/arrow-datafusion/issues/7949
### Describe the bug
The join order chosen for TPCH query 17 is bad making datafusion take more
than 2x longer to execute the query
### To Reproduce
## Create Data:
```shell
cd arrow-datafusion/benchmarks
./bench.sh data tpch10
```
## Run query with datafusion-cli:
```shell
cd arrow-datafusion/benchmarks/data/tpch_sf10
datafusion-cli -c "select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part where p_partkey = l_partkey and p_brand =
'Brand#23' and p_container = 'MED BOX' and l_quantity < (
select 0.2 * avg(l_quantity) from lineitem where l_partkey =
p_partkey );"
```
Takes 7.52 seconds
```
+-------------------+
| avg_yearly |
+-------------------+
| 3295493.512857143 |
+-------------------+
1 row in set. Query took 7.525 seconds.
```
However, if we change the query slightly (swap the table order) it is much
faster (4.5 seconds)
```shell
cd arrow-datafusion/benchmarks/data/tpch_sf10
datafusion-cli -c "select sum(l_extendedprice) / 7.0 as avg_yearly
from part,lineitem where p_partkey = l_partkey and p_brand =
'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select
0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey
);"
```
```
DataFusion CLI v32.0.0
+-------------------+
| avg_yearly |
+-------------------+
| 3295493.512857143 |
+-------------------+
1 row in set. Query took 4.560 seconds.
```
Here is the difference:
```diff
select
sum(l_extendedprice) / 7.0 as avg_yearly from
- lineitem, part
+ part, lineitem
where
p_partkey = l_partkey
and p_brand = 'Brand#23'
and p_container = 'MED BOX'
and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where
l_partkey = p_partkey );
```
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]