Re: [PR] chore: Improve CometExchange metrics [datafusion-comet]

via GitHub Tue, 27 Aug 2024 07:07:55 -0700


andygrove commented on PR #873:
URL: https://github.com/apache/datafusion-comet/pull/873#issuecomment-2312669142


   I ran into an issue trying to test this change with TPC-DS q72:
   
   ```
   : org.apache.spark.SparkException: Job aborted due to stage failure: Total 
size of serialized results of 132 tasks (1025.8 MiB) is bigger than 
spark.driver.maxResultSize (1024.0 MiB)
   ```
   
   I increased `spark.driver.maxResultSize` to 2GB but then the query failed 
with:
   
   ```
   24/08/27 08:02:24 INFO DAGScheduler: ShuffleMapStage 40 (collect at 
/home/andy/git/apache/datafusion-benchmarks/runners/datafusion-comet/tpcbench.py:87)
 failed in 23.369 s due to Job aborted due to stage failure: Task 6 in stage 
40.0 failed 4 times, most recent failure: Lost task 6.3 in stage 40.0 (TID 374) 
(192.168.86.42 executor 0): org.apache.comet.CometNativeException: Resources 
exhausted: Failed to allocate additional 132224 bytes for HashJoinInput[0] with 
4809854208 bytes already allocated for this reservation - 68761 bytes remain 
available for the total pool
   ```
   
   I then switched to using the Comet 0.2.0-rc1 jar with the same configs and 
the query ran without error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Improve CometExchange metrics [datafusion-comet]

Reply via email to