2010YOUY01 opened a new issue, #13089: URL: https://github.com/apache/datafusion/issues/13089
### Describe the bug The below query requires 65M memory to run, if we set memory limit to 50M, it can not run successfully Run in datafusion-cli: ``` cargo run -- --mem-pool-type fair -m 50M -c " select t1.v1, sum(t2.v1) from unnest(generate_series(1,1000)) as t1(v1) , unnest(generate_series(1,1000)) as t2(v1) group by t1.v1, t2.v1" Error: External error: Resources exhausted: Failed to allocate additional 47616 bytes for GroupedHashAggregateStream[0] with 3995896 bytes already allocated for this reservation - 4031073 bytes remain available for the total pool ``` The issue is when doing sort-merge memory usage is over-estimated https://github.com/apache/datafusion/blob/f2da32b3bde851c34e9df0a2f4c174a5392f8897/datafusion/physical-plan/src/sorts/builder.rs#L72 For example, a RecordBatch with 3 arrays, arrays are sharing the same buffers, `record_batch.get_array_memory_size()` will estimate 3X actual memory consumption. (The original `RecordBatch`es passing through datafusion operators don't share `Buffer` between different columns, but in spilling queries, `RecordBatch`es are first written to disk and read back, then it will reuse `Buffer`s among different column arrays) The root cause is already reported in `arrow-rs` https://github.com/apache/arrow-rs/issues/6363 Once it's fixed in the arrow we should check if this aggregation query can run successfully, and also add tests. ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org