andygrove commented on code in PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#discussion_r1672490835
########## spark/src/test/scala/org/apache/spark/sql/CometTPCDSQuerySuite.scala: ########## @@ -112,7 +108,9 @@ class CometTPCDSQuerySuite "q69", "q70", "q71", - "q72", + // TODO: unknown failure (seems memory usage over Github action runner) in CI with q72 in + // https://github.com/apache/datafusion-comet/pull/613. + // "q72", Review Comment: This is the version I have been using locally. Since we are not aiming to run the official TPC-DS benchmarks, but just our derived benchmarks, and also given that we are comparing Spark to Comet for the same queries, I think this would be fine to use by default as long it is well documented in our benchmarking guide. I do think we should still test with the original q72 as a separate exercise though, because if Spark can run it then Comet should be able to as well (with the same memory configuration). ```sql select i_item_desc ,w_warehouse_name ,d1.d_week_seq ,sum(case when p_promo_sk is null then 1 else 0 end) no_promo ,sum(case when p_promo_sk is not null then 1 else 0 end) promo ,count(*) total_cnt from catalog_sales join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk) join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk) join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk) join item on (i_item_sk = cs_item_sk) join inventory on (cs_item_sk = inv_item_sk) join warehouse on (w_warehouse_sk=inv_warehouse_sk) join date_dim d2 on (inv_date_sk = d2.d_date_sk) join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk) left outer join promotion on (cs_promo_sk=p_promo_sk) left outer join catalog_returns on (cr_item_sk = cs_item_sk and cr_order_number = cs_order_number) where d1.d_week_seq = d2.d_week_seq and inv_quantity_on_hand < cs_quantity and d3.d_date > d1.d_date + 5 and hd_buy_potential = '501-1000' and d1.d_year = 1999 and cd_marital_status = 'S' group by i_item_desc,w_warehouse_name,d1.d_week_seq order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq LIMIT 100; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org