FelixYBW opened a new issue, #7249: URL: https://github.com/apache/incubator-gluten/issues/7249
### Backend VL (Velox) ### Bug description It's the new issue triggered by #6988 The root cause is Velox's sort needs to allocate a large memory buffer from global memory when spill is triggered. There should be some design issue there. ``` W20240914 06:04:39.696241 48552 MallocAllocator.cpp:267] [MEM] Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB E20240914 06:04:39.696458 48552 Exceptions.h:67] Line: /home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp:1314, Function:handleAllocationFailure, Expression: allocate failed with 256.00MB from Memory Pool[__sys_spilling__ LEAF root[__sys_root__] parent[__sys_root__] MALLOC no-usage-track thread-safe]<unlimited max capacity unlimited capacity used 0B available 0B reservation [used 0B, reserved 0B, min 0B] counters [allocs 109, frees 103, reserves 0, releases 0, collisions 0])> Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB, Source: RUNTIME, ErrorCode: MEM_ALLOC_ERROR 24/09/14 06:04:39 ERROR [Executor task launch worker for task 1188.0 in stage 2.0 (TID 116257)] listener.ManagedReservationListener: Error reserving memory from target org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError Error Source: RUNTIME Error Code: MEM_ALLOC_ERROR Reason: allocate failed with 256.00MB from Memory Pool[__sys_spilling__ LEAF root[__sys_root__] parent[__sys_root__] MALLOC no-usage-track thread-safe]<unlimited max capacity unlimited capacity used 0B available 0B reservation [used 0B, reserved 0B, min 0B] counters [allocs 109, frees 103, reserves 0, releases 0, collisions 0])> Failed to allocateBytes 256.00MB: Exceeded memory allocator limit of 3.00GB Retriable: True Context: Operator: OrderBy[1] 1 Function: handleAllocationFailure File: /home/binweiyang/gluten/ep/build-velox/build/velox_ep/velox/common/memory/MemoryPool.cpp Line: 1314 Stack trace: # 0 _ZN8facebook5velox7process10StackTraceC1Ei # 1 _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_ # 2 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_ # 3 _ZN8facebook5velox6memory14MemoryPoolImpl23handleAllocationFailureERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE # 4 _ZN8facebook5velox6memory14MemoryPoolImpl8allocateEl # 5 _ZN8facebook5velox4exec7Spiller13fillSpillRunsEPNS1_20RowContainerIteratorE # 6 _ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE # 7 _ZN8facebook5velox4exec10SortBuffer10spillInputEv # 8 _ZN8facebook5velox4exec7OrderBy7reclaimEmRNS0_6memory15MemoryReclaimer5StatsE # 9 _ZNSt17_Function_handlerIFlvEZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS2_6memory10MemoryPoolEmmRNS6_15MemoryReclaimer5StatsEEUlvE_E9_M_invokeERKSt9_Any_data # 10 _ZN8facebook5velox6memory15MemoryReclaimer3runERKSt8functionIFlvEERNS2_5StatsE # 11 _ZN8facebook5velox4exec8Operator15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE # 12 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE # 13 _ZN8facebook5velox4exec23ParallelMemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS3_15MemoryReclaimer5StatsE # 14 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE # 15 _ZN8facebook5velox4exec4Task15MemoryReclaimer11reclaimTaskERKSt10shared_ptrIS2_EmmRNS0_6memory15MemoryReclaimer5StatsE # 16 _ZN8facebook5velox4exec4Task15MemoryReclaimer7reclaimEPNS0_6memory10MemoryPoolEmmRNS4_15MemoryReclaimer5StatsE # 17 _ZN8facebook5velox6memory15MemoryReclaimer7reclaimEPNS1_10MemoryPoolEmmRNS2_5StatsE # 18 _ZN6gluten20ListenableArbitrator14shrinkCapacityEmbb # 19 _ZN6gluten24WholeStageResultIterator14spillFixedSizeEl # 20 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeSpill # 21 0x00007ff1f89bf427 ``` @zhztheplayer ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
