[I] Memory over-reservation when running native shuffle write [datafusion-comet]

via GitHub Thu, 29 Aug 2024 01:57:59 -0700


Kontinuation opened a new issue, #887:
URL: https://github.com/apache/datafusion-comet/issues/887


   ### Describe the bug
   
   We've seen this exception when running queries with 
`spark.comet.exec.shuffle.mode=native`:
   
   ```
   Py4JJavaError: An error occurred while calling o456.collectToPython.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 
in stage 174.0 failed 4 times, most recent failure: Lost task 4.3 in stage 
174.0 (TID 9264) (10.0.132.242 executor 7): 
org.apache.comet.CometNativeException: External error: Resources exhausted: 
Failed to allocate additional 913120256 bytes for ShuffleRepartitioner[0] with 
0 bytes already allocated for this reservation - 901355929 bytes remain 
available for the total pool
        at org.apache.comet.Native.executePlan(Native Method)
        at 
org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:105)
        at 
org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:128)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleWriteProcessor.write(CometShuffleExchangeExec.scala:496)
        at 
org.apache.spark.sql.comet.shims.ShimCometShuffleWriteProcessor.write(ShimCometShuffleWriteProcessor.scala:35)
        at 
org.apache.spark.sql.comet.shims.ShimCometShuffleWriteProcessor.write$(ShimCometShuffleWriteProcessor.scala:28)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleWriteProcessor.write(CometShuffleExchangeExec.scala:452)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:101)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.base/java.lang.Thread.run(Unknown Source)
   ```
   
   This happens when running [TPC-H Query 
10](https://github.com/apache/datafusion-benchmarks/blob/main/tpch/queries/q10.sql)
 with scale factor = 1. The memory allocated for comet is quite small but it 
should not prevent the query from finishing.
   
   ### Steps to reproduce
   
   Running TPC-H query 10 on a Spark cluster. The detailed environment and 
spark configurations are listed in **Additional context**.
   
   ### Expected behavior
   
   All TPC-H queries should finish successfully.
   
   
   
   ### Additional context
   
   The problem was produced on a self-deployed K8S Spark cluster on AWS.
   
   * Driver/executor instance type: r7i.2xlarge (8 vCPUs, 64GB memory)
   * Executor pod resource limit: 6 vCPUs, 48GB memory. We reserved some 
resources for some reason
   * Number of executor instances: 48
   * Spark version: 3.4.0
   * Java version: 17
   * Comet version: commit 
https://github.com/apache/datafusion-comet/commit/9205f0d1913933f2cc8767c02a7728a4e318dd49
   
   Here are relevant spark configurations:
   
   ```
   spark.executor.cores 6
   spark.executor.memory 30719m
   # Reserve native memory for comet, python and other stuff
   spark.executor.memoryOverheadFactor 0.6
   # Each executor core gets 1.2 GB memory for comet, all 6 executors will use 
7.2GB memory.
   # I know this is too small for comet, but it should not prevent the query 
from finishing
   spark.comet.memory.overhead.factor 0.04
   
   spark.sql.extensions org.apache.comet.CometSparkSessionExtensions
   spark.comet.enabled true
   spark.comet.exec.enabled true
   spark.comet.exec.all.enabled true
   spark.comet.exec.shuffle.enabled true
   spark.comet.exec.shuffle.mode auto
   spark.shuffle.manager 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Memory over-reservation when running native shuffle write [datafusion-comet]

Reply via email to