Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

via GitHub Sun, 20 Jul 2025 20:11:58 -0700


ding-young commented on code in PR #15700:
URL: https://github.com/apache/datafusion/pull/15700#discussion_r2218105116



##########
datafusion/physical-plan/src/spill/get_size.rs:
##########
@@ -0,0 +1,216 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::array::{
+    Array, ArrayRef, ArrowPrimitiveType, BooleanArray, FixedSizeBinaryArray,
+    FixedSizeListArray, GenericByteArray, GenericListArray, OffsetSizeTrait,
+    PrimitiveArray, RecordBatch, StructArray,
+};
+use arrow::buffer::{BooleanBuffer, Buffer, NullBuffer, OffsetBuffer};
+use arrow::datatypes::{ArrowNativeType, ByteArrayType};
+use arrow::downcast_primitive_array;
+use arrow_schema::DataType;
+
+/// TODO - NEED TO MOVE THIS TO ARROW

Review Comment:
   I've also experienced a case where using `batch.get_array_memory_size` leads 
to overestimation in aggregation. But that's not because arrow overallocates 
buffers, but because we **"slice"** the batch when we spill in 
`spill_record_batch_by_size_(and_return_max_batch_memory)`. Slicing the batch 
does not modify the buffer capacity, which `batch.get_array_memory_size` relies 
on, so therefore each sliced batch reports the full buffer memory usage, even 
though the buffers are shared between the slices.   
   
    When running 
`test_single_mode_aggregate_single_mode_aggregate_with_spill()` in 
`aggregation_fuzz` with [reproducer 
branch](https://github.com/ding-young/datafusion/tree/batch-memory-reproducer), 
we can verify that naively relying on `batch.get_array_memory_size` with sliced 
batch may lead to overestimation. 
   ```
   ---- 
fuzz_cases::aggregate_fuzz::test_single_mode_aggregate_single_mode_aggregate_with_spill
 stdout ----
   batch.get_actually_used_size:21492, batch.get_array_memory_size:178946
   batch.get_actually_used_size:21624, batch.get_array_memory_size:178946
   batch.get_actually_used_size:21635, batch.get_array_memory_size:178946
   batch.get_actually_used_size:21620, batch.get_array_memory_size:178946
   batch.get_actually_used_size:21636, batch.get_array_memory_size:178946
   ...
   ```
   
   Since sort does not use `spill_record_batch_by_size` api, so no slicing is 
involved, (I believe) only aggregation suffered from this issue. If slicing was 
the only cause for overestimation, I think we can just add special case 
handling for aggregation and keep it simple with given 
`batch.get_array_memory_size` api.  
   
   I wonder if the overestimation @rluvaton encountered was caused by this 
issue, or if it was due to something else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: add multi level merge sort that will always fit in memory [datafusion]

Reply via email to