alamb opened a new issue, #7949:
URL: https://github.com/apache/arrow-datafusion/issues/7949

   ### Describe the bug
   
   The join order chosen for TPCH query 17 is bad making datafusion take more 
than 2x longer to execute the query
   
   
   
   ### To Reproduce
   
   
   ## Create Data:
   
   ```shell
   cd arrow-datafusion/benchmarks
   ./bench.sh  data tpch10
   ```
   
   ## Run query with datafusion-cli:
   
   ```shell
   cd arrow-datafusion/benchmarks/data/tpch_sf10
   datafusion-cli -c "select    sum(l_extendedprice) / 7.0 as avg_yearly        
from    lineitem,       part    where   p_partkey = l_partkey   and p_brand = 
'Brand#23'        and p_container = 'MED BOX'     and l_quantity < (      
select  0.2 * avg(l_quantity)   from    lineitem        where   l_partkey = 
p_partkey   );"
   ```
   
   Takes 7.52 seconds
   ```
   +-------------------+
   | avg_yearly        |
   +-------------------+
   | 3295493.512857143 |
   +-------------------+
   1 row in set. Query took 7.525 seconds.
   ```
   
   However, if we change the query slightly (swap the table order) it is much 
faster (4.5 seconds)
   
   ```shell
   cd arrow-datafusion/benchmarks/data/tpch_sf10
   datafusion-cli -c "select   sum(l_extendedprice) / 7.0 as avg_yearly        
from    part,lineitem   where   p_partkey = l_partkey   and p_brand = 
'Brand#23'        and p_container = 'MED BOX' and l_quantity < (      select  
0.2 * avg(l_quantity)   from    lineitem        where   l_partkey = p_partkey   
);"
   ```
   
   ```
   DataFusion CLI v32.0.0
   +-------------------+
   | avg_yearly        |
   +-------------------+
   | 3295493.512857143 |
   +-------------------+
   1 row in set. Query took 4.560 seconds.
   ```
   
   Here is the difference:
   ```diff
   select
   sum(l_extendedprice) / 7.0 as avg_yearly from
   - lineitem, part
   + part, lineitem
   where
     p_partkey = l_partkey
     and p_brand = 'Brand#23'
     and p_container = 'MED BOX'
     and l_quantity < ( select  0.2 * avg(l_quantity)   from    lineitem where  
l_partkey = p_partkey   );
   ```
   
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to