dingxin-tech opened a new issue, #9698:
URL: https://github.com/apache/incubator-gluten/issues/9698
### Backend
VL (Velox)
### Bug description
Hello Team, I found that when the Executor reports an error due to OOM, it
will only print out the error log, but will not exit the Executor, resulting in
no automatic retry.
I can reproduce this problem using a small offheap memory. I am using Gluten
version 1.2.1. Upgrading is not easy for me. Is this a known problem? Maybe I
can cherry pick some bugfix patch?
```
25/05/20 15:12:00 ERROR ManagedReservationListener: Error reserving memory
from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException:
Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 341.3
KiB. Try tweaking config option spark.memory.offHeap.size to get larger space
to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled
is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=1024.0 KiB
spark.gluten.memory.task.offHeap.size.in.bytes=256.0 KiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=128.0 KiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.0: Current used bytes: 0.0 B, peak
bytes: N/A
\- Gluten.Tree.3: Current used bytes: 0.0 B, peak
bytes: 443.7 KiB
\- root.3: Current used bytes: 0.0 B, peak
bytes: 443.7 KiB
+- OverAcquire.DummyTarget.0: Current used bytes: 0.0 B, peak
bytes: 102.4 KiB
\- RowToColumnar.3: Current used bytes: 0.0 B, peak
bytes: 341.3 KiB
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
at
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
at org.apache.gluten.exec.RuntimeJniWrapper.createRuntime(Native Method)
at org.apache.gluten.exec.Runtime$RuntimeImpl.<init>(Runtime.scala:63)
at org.apache.gluten.exec.Runtime$.apply(Runtime.scala:48)
at org.apache.gluten.exec.Runtimes$.create(Runtimes.scala:33)
at
org.apache.gluten.exec.Runtimes$.$anonfun$contextInstance$1(Runtimes.scala:29)
at
org.apache.spark.util.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(TaskResources.scala:320)
at
org.apache.spark.util.TaskResourceRegistry.lock(TaskResources.scala:245)
at
org.apache.spark.util.TaskResourceRegistry.addResourceIfNotRegistered(TaskResources.scala:316)
at
org.apache.spark.util.TaskResources$.addResourceIfNotRegistered(TaskResources.scala:157)
at org.apache.gluten.exec.Runtimes$.contextInstance(Runtimes.scala:29)
at
org.apache.gluten.execution.RowToVeloxColumnarExec$.toColumnarBatchIterator(RowToVeloxColumnarExec.scala:117)
at
org.apache.gluten.execution.RowToVeloxColumnarExec.$anonfun$doExecuteColumnarInternal$2(RowToVeloxColumnarExec.scala:72)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.gluten.execution.ColumnarInputRDDsWrapper.$anonfun$getIterators$1(WholeStageTransformer.scala:445)
at scala.collection.immutable.List.flatMap(List.scala:366)
at
org.apache.gluten.execution.ColumnarInputRDDsWrapper.getIterators(WholeStageTransformer.scala:436)
at
org.apache.gluten.execution.WholeStageZippedPartitionsRDD.$anonfun$compute$1(WholeStageZippedPartitionsRDD.scala:48)
at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
at
org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
at
org.apache.gluten.execution.WholeStageZippedPartitionsRDD.compute(WholeStageZippedPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:879)
```
### Gluten version
Gluten-1.2
### Spark version
Spark-3.4.x
### Spark configurations
spark.plugins=org.apache.gluten.GlutenPlugin
spark.memory.offHeap.enabled=true
spark.memory.offHeap.size=1m
spark.shuffle.manager = org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.gluten.sql.columnar.forceShuffledHashJoin=true
spark.gluten.sql.native.writer.enabled = false
### System information
Velox System Info v0.0.2
Commit: ea7315f851bc5fb5aaf7ce5a0e27a97df4a0ec31
CMake Version: 3.14.5
System: Linux-4.19.91-011.ali4000.alios7.x86_64
Arch: x86_64
CPU Name: Model name: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
C++ Compiler: /usr/local/alicpp/built/gcc-9.2.1/gcc-9.2.1/bin/g++
C++ Compiler Version: 9.2.1
C Compiler: /usr/local/alicpp/built/gcc-9.2.1/gcc-9.2.1/bin/gcc
C Compiler Version: 9.2.1
CMake Prefix Path:
/usr/local;/usr;/;/usr/local;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
```bash
25/05/20 15:12:00 ERROR ManagedReservationListener: Error reserving memory
from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException:
Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 341.3
KiB. Try tweaking config option spark.memory.offHeap.size to get larger space
to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled
is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=1024.0 KiB
spark.gluten.memory.task.offHeap.size.in.bytes=256.0 KiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=128.0 KiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.0: Current used bytes: 0.0 B, peak
bytes: N/A
\- Gluten.Tree.3: Current used bytes: 0.0 B, peak
bytes: 443.7 KiB
\- root.3: Current used bytes: 0.0 B, peak
bytes: 443.7 KiB
+- OverAcquire.DummyTarget.0: Current used bytes: 0.0 B, peak
bytes: 102.4 KiB
\- RowToColumnar.3: Current used bytes: 0.0 B, peak
bytes: 341.3 KiB
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
at
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
at org.apache.gluten.exec.RuntimeJniWrapper.createRuntime(Native Method)
at org.apache.gluten.exec.Runtime$RuntimeImpl.<init>(Runtime.scala:63)
at org.apache.gluten.exec.Runtime$.apply(Runtime.scala:48)
at org.apache.gluten.exec.Runtimes$.create(Runtimes.scala:33)
at
org.apache.gluten.exec.Runtimes$.$anonfun$contextInstance$1(Runtimes.scala:29)
at
org.apache.spark.util.TaskResourceRegistry.$anonfun$addResourceIfNotRegistered$1(TaskResources.scala:320)
at
org.apache.spark.util.TaskResourceRegistry.lock(TaskResources.scala:245)
at
org.apache.spark.util.TaskResourceRegistry.addResourceIfNotRegistered(TaskResources.scala:316)
at
org.apache.spark.util.TaskResources$.addResourceIfNotRegistered(TaskResources.scala:157)
at org.apache.gluten.exec.Runtimes$.contextInstance(Runtimes.scala:29)
at
org.apache.gluten.execution.RowToVeloxColumnarExec$.toColumnarBatchIterator(RowToVeloxColumnarExec.scala:117)
at
org.apache.gluten.execution.RowToVeloxColumnarExec.$anonfun$doExecuteColumnarInternal$2(RowToVeloxColumnarExec.scala:72)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:853)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:853)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.gluten.execution.ColumnarInputRDDsWrapper.$anonfun$getIterators$1(WholeStageTransformer.scala:445)
at scala.collection.immutable.List.flatMap(List.scala:366)
at
org.apache.gluten.execution.ColumnarInputRDDsWrapper.getIterators(WholeStageTransformer.scala:436)
at
org.apache.gluten.execution.WholeStageZippedPartitionsRDD.$anonfun$compute$1(WholeStageZippedPartitionsRDD.scala:48)
at org.apache.gluten.utils.Arm$.withResource(Arm.scala:25)
at
org.apache.gluten.metrics.GlutenTimeMetric$.millis(GlutenTimeMetric.scala:37)
at
org.apache.gluten.execution.WholeStageZippedPartitionsRDD.compute(WholeStageZippedPartitionsRDD.scala:46)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1545)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:879)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]