FelixYBW opened a new issue, #7249:
URL: https://github.com/apache/incubator-gluten/issues/7249

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   It's the new issue triggered by #6988
   
   The root cause is Velox's sort needs to allocate a large memory buffer from 
global memory when spill is triggered. There should be some design issue there. 
   
   ```
   W20240914 06:04:39.696241 48552 MallocAllocator.cpp:267] [MEM] Failed to 
allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB
   E20240914 06:04:39.696458 48552 Exceptions.h:67] Line: 
/home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp:1314,
 Function:handleAllocationFailure, Expression:  allocate failed with 256.00MB 
from Memory Pool[__sys_spilling__ LEAF root[__sys_root__] parent[__sys_root__] 
MALLOC no-usage-track thread-safe]<unlimited max capacity unlimited capacity 
used 0B available 0B reservation [used 0B, reserved 0B, min 0B] counters 
[allocs 109, frees 103, reserves 0, releases 0, collisions 0])> Failed to 
allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB, Source: 
RUNTIME, ErrorCode: MEM_ALLOC_ERROR
   24/09/14 06:04:39 ERROR [Executor task launch worker for task 1188.0 in 
stage 2.0 (TID 116257)] listener.ManagedReservationListener: Error reserving 
memory from target
   org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
   Error Source: RUNTIME
   Error Code: MEM_ALLOC_ERROR
   Reason: allocate failed with 256.00MB from Memory Pool[__sys_spilling__ LEAF 
root[__sys_root__] parent[__sys_root__] MALLOC no-usage-track 
thread-safe]<unlimited max capacity unlimited capacity used 0B available 0B 
reservation [used 0B, reserved 0B, min 0B] counters [allocs 109, frees 103, 
reserves 0, releases 0, collisions 0])> Failed to allocateBytes 256.00MB: 
Exceeded memory allocator limit of 3.00GB
   Retriable: True
   Context: Operator: OrderBy[1] 1
   Function: handleAllocationFailure
   File: 
/home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp
   Line: 1314
   Stack trace:
   # 0  _ZN8facebook5velox7process10StackTraceC1Ei
   # 1  
_ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
   # 2  
_ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
   # 3  
_ZN8facebook5velox6memory14MemoryPoolImpl23handleAllocationFailureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
   # 4  _ZN8facebook5velox6memory14MemoryPoolImpl8allocateEl
   # 5  
_ZN8facebook5velox4exec7Spiller13fillSpillRunsEPNS1_20RowContainerIteratorE
   # 6  _ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE
   # 7  _ZN8facebook5velox4exec10SortBuffer10spillInputEv
   # 8  
_ZN8facebook5velox4exec7OrderBy7reclaimEmRNS0_6memory15MemoryReclaimer5StatsE
   # 9  
_ZNSt17_Function_handlerIFlvEZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS2_6memory10MemoryPoolEmmRNS6_15MemoryReclaimer5StatsEEUlvE_E9_M_invokeERKSt9_Any_data
   # 10 
_ZN8facebook5velox6memory15MemoryReclaimer3runERKSt8functionIFlvEERNS2_5StatsE
   # 11 
_ZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE
   # 12 
_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
   # 13 
_ZN8facebook5velox4exec23ParallelMemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS3_15MemoryReclaimer5StatsE
   # 14 
_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
   # 15 
_ZN8facebook5velox4exec4Task15MemoryReclaimer11reclaimTaskERKSt10shared_ptrIS2_EmmRNS0_6memory15MemoryReclaimer5StatsE
   # 16 
_ZN8facebook5velox4exec4Task15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE
   # 17 
_ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE
   # 18 _ZN6gluten20ListenableArbitrator14shrinkCapacityEmbb
   # 19 _ZN6gluten24WholeStageResultIterator14spillFixedSizeEl
   # 20 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeSpill
   # 21 0x00007ff1f89bf427
   ```
   
   @zhztheplayer 
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to