sagarlakshmipathy opened a new issue, #180:
URL: https://github.com/apache/arrow-datafusion-comet/issues/180
### Describe the bug
While running Comet with OSS Spark, I noticed warning messages on some
queries indicating that `Comet native execution is disabled`. Wondering why
that is.
Here's the execution log:
```
====================================================================================================
RUNNING: Query # 15 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled
due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915,
[id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class
'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN QueryPlanSerde: Comet native execution is disabled
due to: unsupported Spark expression: 'might_contain(Subquery subquery#8915,
[id=#74608], xxhash64(cs_sold_date_sk#277, 42))' of class
'org.apache.spark.sql.catalyst.expressions.BloomFilterMightContain
24/03/09 23:16:27 WARN DAGScheduler: Broadcasting large task binary with
size 1047.8 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with
size 1096.7 KiB
24/03/09 23:16:33 WARN DAGScheduler: Broadcasting large task binary with
size 1143.9 KiB
24/03/09 23:16:35 WARN DAGScheduler: Broadcasting large task binary with
size 1131.6 KiB
Time taken: 8596 ms
----------------------------------------------------------------------------------------------------
FINISHED: Query # 15 (round 1)
====================================================================================================
```
Here's the query itself
```
--TPC-DS Q15
select ca_zip
,sum(cs_sales_price)
from catalog_sales
,customer
,customer_address
,date_dim
where cs_bill_customer_sk = c_customer_sk
and c_current_addr_sk = ca_address_sk
and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475',
'85392', '85460', '80348', '81792')
or ca_state in ('CA','WA','GA')
or cs_sales_price > 500)
and cs_sold_date_sk = d_date_sk
and d_qoy = 2 and d_year = 2002
group by ca_zip
order by ca_zip
limit 100;
```
Regardless, I could see that the queries ran faster.
### Steps to reproduce
1. Run a TPCDS query test, maybe just for query 15
Apologies for mentioning minimal steps here. Thats all thats needed
fortunately.
### Expected behavior
No WARN messages
### Additional context
This only happened for some queries. For example, Q46 ran without any issues.
```
====================================================================================================
RUNNING: Query # 46 (round 1) (1 statements)
----------------------------------------------------------------------------------------------------
Time taken: 18658 ms
]
----------------------------------------------------------------------------------------------------
FINISHED: Query # 46 (round 1)
====================================================================================================
```
```
--TPC-DS Q46
select c_last_name
,c_first_name
,ca_city
,bought_city
,ss_ticket_number
,amt,profit
from
(select ss_ticket_number
,ss_customer_sk
,ca_city bought_city
,sum(ss_coupon_amt) amt
,sum(ss_net_profit) profit
from store_sales,date_dim,store,household_demographics,customer_address
where store_sales.ss_sold_date_sk = date_dim.d_date_sk
and store_sales.ss_store_sk = store.s_store_sk
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
and store_sales.ss_addr_sk = customer_address.ca_address_sk
and (household_demographics.hd_dep_count = 3 or
household_demographics.hd_vehicle_count= 1)
and date_dim.d_dow in (6,0)
and date_dim.d_year in (1999,1999+1,1999+2)
and store.s_city in ('Midway','Fairview','Fairview','Midway','Fairview')
group by ss_ticket_number,ss_customer_sk,ss_addr_sk,ca_city)
dn,customer,customer_address current_addr
where ss_customer_sk = c_customer_sk
and customer.c_current_addr_sk = current_addr.ca_address_sk
and current_addr.ca_city <> bought_city
order by c_last_name
,c_first_name
,ca_city
,bought_city
,ss_ticket_number
limit 100;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]