andygrove commented on PR #873: URL: https://github.com/apache/datafusion-comet/pull/873#issuecomment-2312669142
I ran into an issue trying to test this change with TPC-DS q72: ``` : org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 132 tasks (1025.8 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB) ``` I increased `spark.driver.maxResultSize` to 2GB but then the query failed with: ``` 24/08/27 08:02:24 INFO DAGScheduler: ShuffleMapStage 40 (collect at /home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/tpcbench.py:87) failed in 23.369 s due to Job aborted due to stage failure: Task 6 in stage 40.0 failed 4 times, most recent failure: Lost task 6.3 in stage 40.0 (TID 374) (192.168.86.42 executor 0): org.apache.comet.CometNativeException: Resources exhausted: Failed to allocate additional 132224 bytes for HashJoinInput[0] with 4809854208 bytes already allocated for this reservation - 68761 bytes remain available for the total pool ``` I then switched to using the Comet 0.2.0-rc1 jar with the same configs and the query ran without error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
