alamb opened a new issue, #17259:
URL: https://github.com/apache/datafusion/issues/17259

   ### Is your feature request related to a problem or challenge?
   
   @MrPowers reportes in discord: 
https://discord.com/channels/885562378132000778/1290751484807352412/1407568961561952277
   
   > I ran the TPC-H queries on my Macbook M3 with 16GB of RAM with different 
scale factors.  The DataFusion, DuckDB, and Polars Streaming results are 
similar for scale factor 5:
   
   <img width="1070" height="850" alt="Image" 
src="https://github.com/user-attachments/assets/21aba060-b4b3-4f0b-85ba-66e56c48f1a3";
 />
   
   > I ran the TPC-H queries on my Macbook M3 with 16GB of RAM with different 
scale factors.  The DataFusion, DuckDB, and Polars Streaming results are 
similar for scale factor 5:
   
   <img width="1054" height="844" alt="Image" 
src="https://github.com/user-attachments/assets/a7813828-2d3f-4d3d-bb29-9373841ce9ac";
 />
   
   ### Describe the solution you'd like
   
   Figure out why q4, q7 and q9 are very slow 
   
   The TPCH queries are here: 
https://github.com/apache/datafusion/tree/main/benchmarks/queries
   
   
   
[q4](https://github.com/apache/datafusion/blob/main/benchmarks/queries/q4.sql)
   ```sql
   select
       o_orderpriority,
       count(*) as order_count
   from
       orders
   where
           o_orderdate >= '1993-07-01'
     and o_orderdate < date '1993-07-01' + interval '3' month
     and exists (
           select
               *
           from
               lineitem
           where
                   l_orderkey = o_orderkey
             and l_commitdate < l_receiptdate
       )
   group by
       o_orderpriority
   order by
       o_orderpriority;
   ```
   
   
[q7](https://github.com/apache/datafusion/blob/main/benchmarks/queries/q7.sql)
   
   ```sql
   select
       supp_nation,
       cust_nation,
       l_year,
       sum(volume) as revenue
   from
       (
           select
               n1.n_name as supp_nation,
               n2.n_name as cust_nation,
               extract(year from l_shipdate) as l_year,
               l_extendedprice * (1 - l_discount) as volume
           from
               supplier,
               lineitem,
               orders,
               customer,
               nation n1,
               nation n2
           where
                   s_suppkey = l_suppkey
             and o_orderkey = l_orderkey
             and c_custkey = o_custkey
             and s_nationkey = n1.n_nationkey
             and c_nationkey = n2.n_nationkey
             and (
                   (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                   or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
               )
             and l_shipdate between date '1995-01-01' and date '1996-12-31'
       ) as shipping
   group by
       supp_nation,
       cust_nation,
       l_year
   order by
       supp_nation,
       cust_nation,
       l_year;
   ```
   
   
[q9](https://github.com/apache/datafusion/blob/main/benchmarks/queries/q9.sql)
   
   ```sql
   select
       nation,
       o_year,
       sum(amount) as sum_profit
   from
       (
           select
               n_name as nation,
               extract(year from o_orderdate) as o_year,
               l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity 
as amount
           from
               part,
               supplier,
               lineitem,
               partsupp,
               orders,
               nation
           where
                   s_suppkey = l_suppkey
             and ps_suppkey = l_suppkey
             and ps_partkey = l_partkey
             and p_partkey = l_partkey
             and o_orderkey = l_orderkey
             and s_nationkey = n_nationkey
             and p_name like '%green%'
       ) as profit
   group by
       nation,
       o_year
   order by
       nation,
       o_year desc;
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to