kecookier opened a new issue, #8088:
URL: https://github.com/apache/incubator-gluten/issues/8088
### Backend
VL (Velox)
### Bug description
The ShuffleWriter.default_leaf(velox::memory::MemoryPool) allocated too much
memory in `VeloxHashShuffleWriter`, causing an off-heap OOM.
```
24/11/26 21:31:42 ERROR Executor task launch worker for task 1559
ManagedReservationListener: Error reserving memory from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException:
Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 0.0 B.
Try tweaking config option spark.memory.offHeap.size to get larger space to run
this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not
enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=13690208256
spark.gluten.memory.task.offHeap.size.in.bytes=6845104128
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=3422552064
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.1559: Current used
bytes: 8.4 GiB, peak bytes: N/A
\- Gluten.Tree.0: Current used
bytes: 8.4 GiB, peak bytes: 11.9 GiB
\- root.0: Current used
bytes: 8.4 GiB, peak bytes: 11.9 GiB
+- ShuffleWriter.0: Current used
bytes: 8.3 GiB, peak bytes: 8.8 GiB
| \- single: Current used
bytes: 8.3 GiB, peak bytes: 8.8 GiB
| +- root: Current used
bytes: 8.2 GiB, peak bytes: 8.2 GiB
| | \- default_leaf: Current used
bytes: 8.2 GiB, peak bytes: 8.2 GiB
| \- gluten::MemoryAllocator: Current used
bytes: 62.9 MiB, peak bytes: 1436.4 MiB
+- VeloxBatchAppender.0: Current used
bytes: 104.0 MiB, peak bytes: 224.0 MiB
| \- single: Current used
bytes: 104.0 MiB, peak bytes: 224.0 MiB
| +- root: Current used
bytes: 100.2 MiB, peak bytes: 224.0 MiB
| | \- default_leaf: Current used
bytes: 100.2 MiB, peak bytes: 216.8 MiB
| \- gluten::MemoryAllocator: Current used
bytes: 0.0 B, peak bytes: 0.0 B
+- NativePlanEvaluator-1.0: Current used
bytes: 25.0 MiB, peak bytes: 176.0 MiB
| \- single: Current used
bytes: 25.0 MiB, peak bytes: 176.0 MiB
| +- root: Current used
bytes: 22.6 MiB, peak bytes: 169.0 MiB
| | +- task.Gluten_Stage_2_TID_1559_VTID_0: Current used
bytes: 22.6 MiB, peak bytes: 169.0 MiB
| | | +- node.0: Current used
bytes: 22.1 MiB, peak bytes: 168.0 MiB
| | | | +- op.0.0.0.TableScan: Current used
bytes: 22.1 MiB, peak bytes: 162.8 MiB
| | | | \- op.0.0.0.TableScan.test-hive: Current used
bytes: 0.0 B, peak bytes: 0.0 B
| | | \- node.1: Current used
bytes: 528.2 KiB, peak bytes: 1024.0 KiB
| | | \- op.1.0.0.FilterProject: Current used
bytes: 528.2 KiB, peak bytes: 849.5 KiB
| | \- default_leaf: Current used
bytes: 0.0 B, peak bytes: 0.0 B
| \- gluten::MemoryAllocator: Current used
bytes: 0.0 B, peak bytes: 0.0 B
+- ArrowContextInstance.0: Current used
bytes: 0.0 B, peak bytes: 0.0 B
+- VeloxBatchAppender.0.OverAcquire.0: Current used
bytes: 0.0 B, peak bytes: 67.2 MiB
+- IndicatorVectorBase#init.0.OverAcquire.0: Current used
bytes: 0.0 B, peak bytes: 2.4 MiB
+- NativePlanEvaluator-1.0.OverAcquire.0: Current used
bytes: 0.0 B, peak bytes: 52.8 MiB
+- ShuffleWriter.0.OverAcquire.0: Current used
bytes: 0.0 B, peak bytes: 2.6 GiB
\- IndicatorVectorBase#init.0: Current used
bytes: 0.0 B, peak bytes: 8.0 MiB
\- single: Current used
bytes: 0.0 B, peak bytes: 8.0 MiB
+- root: Current used
bytes: 0.0 B, peak bytes: 0.0 B
| \- default_leaf: Current used
bytes: 0.0 B, peak bytes: 0.0 B
\- gluten::MemoryAllocator: Current used
bytes: 0.0 B, peak bytes: 0.0 B
at
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:66)
at
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
at org.apache.gluten.vectorized.ShuffleWriterJniWrapper.write(Native
Method)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:177)
at
org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:231)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:134)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:479)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1448)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:482)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
```
### Where is VeloxMemoryPool used in VeloxHashShuffleWriter?
When `splitComplexType()` is called, the vector will first be serialized by
`PrestoVectorSerde`, and then flushed to cache by the function
`evictPartitionBuffers()`. The memory held by `arenas_` will be freed only
after flushing.
### Why is so much memory used?
When `doSplit` is called, we estimate how many rows can fit within the
current task's available memory, and then adapt the last partition buffers. We
estimate without considering complex type columns, only simple columns. Thus,
the memory of the complex type is missed. As we iterate batch by batch, we
check if the current estimated rows are much larger than the already existing
partition buffers. If so, we cache these buffers (evict partition buffer to
payloadCache), and the cached payload will spill in the future, and then the
memory is freed. f our complex type vector is large, the eviction is typically
not triggered until the process has already run out of memory (OOM).
### Possible Solutions
1. The default partition buffer size is 4096. In our case, the schema is
`{int, string, map<string, string>, map<string, string>}`. Almost after
iterating 200+ batches, the process will run out of memory. We can change this
option to 200, and the job can succeed, but it's not a general solution.
2. When estimating how many rows can fit within the current task's available
memory, also consider complex type columns. We can use `arenas_` to do this.
### Spark version
None
### Spark configurations
_No response_
### System information
_No response_
### Relevant logs
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]