[I] TPCDS query 91 throws [arrow-datafusion-comet]

via GitHub Sat, 09 Mar 2024 16:30:49 -0800


sagarlakshmipathy opened a new issue, #182:
URL: https://github.com/apache/arrow-datafusion-comet/issues/182


   ### Describe the bug
   
   I am running TPCDS on Comet+Spark. Esp. Query 91 throws this error
   
   ```
   Driver stacktrace:
     at 
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720)
     at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
     at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
     at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206)
     at 
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206)
     at scala.Option.foreach(Option.scala:407)
     at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923)
     at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912)
     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   Caused by: org.apache.comet.CometNativeException: General execution error 
with reason java.lang.RuntimeException: java.io.IOException: Could not read 
object from config with key parquet.private.read.filter.predicate
   Caused by: java.io.IOException: Could not read object from config with key 
parquet.private.read.filter.predicate
   Caused by: java.lang.ClassNotFoundException: 
org.apache.comet.parquet.ParquetFilters$$anon$1.
     at org.apache.comet.Native.executePlan(Native Method)
     at 
org.apache.comet.CometExecIterator.executeNative(CometExecIterator.scala:71)
     at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:123)
     at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:138)
     at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Unknown
 Source)
     at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
     at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
     at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
     at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$1(ObjectHashAggregateExec.scala:92)
     at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec.$anonfun$doExecute$1$adapted(ObjectHashAggregateExec.scala:90)
     at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2(RDD.scala:875)
     at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndexInternal$2$adapted(RDD.scala:875)
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
     at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
     at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
     at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
     at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
     at org.apache.spark.scheduler.Task.run(Task.scala:139)
     at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
     at java.lang.Thread.run(Thread.java:750)
   ```
   
   I see that you guys have run the tpcds benchmarking tests, but I couldn't 
find the tpcds queries in resources dir in the repo. Can you point me to the 
query that worked?
   
   ### Steps to reproduce
   
   Run TPCDS Query 91
   
   ```
   --TPC-DS Q91
   select  
           cc_call_center_id Call_Center,
           cc_name Call_Center_Name,
           cc_manager Manager,
           sum(cr_net_loss) Returns_Loss
   from
           call_center,
           catalog_returns,
           date_dim,
           customer,
           customer_address,
           customer_demographics,
           household_demographics
   where
           cr_call_center_sk       = cc_call_center_sk
   and     cr_returned_date_sk     = d_date_sk
   and     cr_returning_customer_sk= c_customer_sk
   and     cd_demo_sk              = c_current_cdemo_sk
   and     hd_demo_sk              = c_current_hdemo_sk
   and     ca_address_sk           = c_current_addr_sk
   and     d_year                  = 2002 
   and     d_moy                   = 11
   and     ( (cd_marital_status       = 'M' and cd_education_status     = 
'Unknown')
           or(cd_marital_status       = 'W' and cd_education_status     = 
'Advanced Degree'))
   and     hd_buy_potential like 'Unknown%'
   and     ca_gmt_offset           = -6
   group by 
cc_call_center_id,cc_name,cc_manager,cd_marital_status,cd_education_status
   order by sum(cr_net_loss) desc;
   ```
   
   ### Expected behavior
   
   Query to pass successfully
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] TPCDS query 91 throws [arrow-datafusion-comet]

Reply via email to