[I] [VL][1.2] Multiple copies were created during serialization of broadcast columnar batches, resulting in excessive memory usage [incubator-gluten]

via GitHub Wed, 30 Apr 2025 06:21:58 -0700


wForget opened a new issue, #9475:
URL: https://github.com/apache/incubator-gluten/issues/9475


   ### Description
   
   An OOM exception occurred during broadcast, but the input of the task was 
only 69.3 MiB.
   
   ```
   25/04/29 16:30:44 WARN ManagedReservationListener: Error reserving memory 
from target
   
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: 
Not enough spark off-heap execution memory. Acquired: 1056.0 MiB, granted: 
120.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger 
space to run this application (if 
spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
   Current config settings: 
        spark.gluten.memory.offHeap.size.in.bytes=4.0 GiB
        spark.gluten.memory.task.offHeap.size.in.bytes=4.0 GiB
        spark.gluten.memory.conservative.task.offHeap.size.in.bytes=2.0 GiB
        spark.memory.offHeap.enabled=true
        spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
   Memory consumer stats: 
        Task.10:                                              Current used 
bytes:    3.9 GiB, peak bytes:        N/A
        \- Gluten.Tree.7:                                     Current used 
bytes:    3.9 GiB, peak bytes:    4.0 GiB
           \- root.7:                                         Current used 
bytes:    3.9 GiB, peak bytes:    4.0 GiB
              +- BroadcastUtils#serializeStream.7:            Current used 
bytes:    2.1 GiB, peak bytes:    2.2 GiB
              |  \- single:                                   Current used 
bytes:    2.1 GiB, peak bytes:    2.1 GiB
              |     +- gluten::MemoryAllocator:               Current used 
bytes: 1063.2 MiB, peak bytes: 1063.2 MiB
              |     \- root:                                  Current used 
bytes: 1059.2 MiB, peak bytes: 1064.0 MiB
              |        \- default_leaf:                       Current used 
bytes: 1059.2 MiB, peak bytes: 1059.2 MiB
              +- WholeStageIterator.7:                        Current used 
bytes: 1848.0 MiB, peak bytes: 1864.0 MiB
              |  \- single:                                   Current used 
bytes: 1848.0 MiB, peak bytes: 1864.0 MiB
              |     +- root:                                  Current used 
bytes: 1841.9 MiB, peak bytes: 1864.0 MiB
              |     |  +- task.Gluten_Stage_6_TID_10_VTID_7:  Current used 
bytes: 1841.9 MiB, peak bytes: 1864.0 MiB
              |     |  |  \- node.0:                          Current used 
bytes: 1841.9 MiB, peak bytes: 1864.0 MiB
              |     |  |     +- op.0.0.0.TableScan:           Current used 
bytes: 1841.9 MiB, peak bytes: 1857.1 MiB
              |     |  |     \- op.0.0.0.TableScan.test-hive: Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              |     |  \- default_leaf:                       Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              |     \- gluten::MemoryAllocator:               Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              +- IndicatorVectorBase#init.7:                  Current used 
bytes:      0.0 B, peak bytes:    8.0 MiB
              |  \- single:                                   Current used 
bytes:      0.0 B, peak bytes:    8.0 MiB
              |     +- gluten::MemoryAllocator:               Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              |     \- root:                                  Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              |        \- default_leaf:                       Current used 
bytes:      0.0 B, peak bytes:      0.0 B
              +- OverAcquire.DummyTarget.23:                  Current used 
bytes:      0.0 B, peak bytes:  319.2 MiB
              +- OverAcquire.DummyTarget.22:                  Current used 
bytes:      0.0 B, peak bytes:    2.4 MiB
              \- OverAcquire.DummyTarget.21:                  Current used 
bytes:      0.0 B, peak bytes:  559.2 MiB
   
        at 
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:105)
        at 
org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:43)
        at 
org.apache.gluten.vectorized.ColumnarBatchSerializerJniWrapper.serialize(Native 
Method)
        at 
org.apache.spark.sql.execution.BroadcastUtils$.serializeStream(BroadcastUtils.scala:160)
        at 
org.apache.gluten.backendsapi.velox.VeloxSparkPlanExecApi.$anonfun$createBroadcastRelation$1(VeloxSparkPlanExecApi.scala:619)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   W20250429 16:30:44.905328 3567390 MemoryAllocator.cpp:265] [MEM] Exceeded 
memory reservation limit when reserve 271145 new pages when allocate 271145 
pages, error: std::exception
   ```
   
   
![Image](https://github.com/user-attachments/assets/9e2e1af3-afd8-4eca-b895-6c45509e1d8c)
   
   
![Image](https://github.com/user-attachments/assets/aa4c6a5c-eacb-4e9f-bbeb-3471b9c40ff9)
   
   Multiple copies were created during serialization of broadcast columnar 
batches:
   + input columnar batches
   + serializer buffer (handle all batches)
   + output buffer (handle all batches)
   
   We may have two improvement options:
   1. serialize columnar batch one by one (but this will introduce multiple jni 
calls)
   2. Immediately flush to out buffer after the serializer appends a batch 
(this means the serializer buffer only needs to be size of one batch)
   
   ### Gluten version
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL][1.2] Multiple copies were created during serialization of broadcast columnar batches, resulting in excessive memory usage [incubator-gluten]

Reply via email to