2010YOUY01 opened a new issue, #13089:
URL: https://github.com/apache/datafusion/issues/13089

   ### Describe the bug
   
   The below query requires 65M memory to run, if we set memory limit to 50M, 
it can not run successfully
   Run in datafusion-cli:
   ```
   cargo run -- --mem-pool-type fair -m 50M -c "
   select t1.v1,  sum(t2.v1)
   from
   unnest(generate_series(1,1000)) as t1(v1)
   , unnest(generate_series(1,1000)) as t2(v1)
   group by t1.v1, t2.v1"
   
   Error: External error: Resources exhausted: Failed to allocate additional 
47616 bytes for GroupedHashAggregateStream[0] with 3995896 bytes already 
allocated for this reservation - 4031073 bytes remain available for the total 
pool
   ```
   
   The issue is when doing sort-merge memory usage is over-estimated
   
https://github.com/apache/datafusion/blob/f2da32b3bde851c34e9df0a2f4c174a5392f8897/datafusion/physical-plan/src/sorts/builder.rs#L72
   For example, a RecordBatch with 3 arrays, arrays are sharing the same 
buffers, `record_batch.get_array_memory_size()` will estimate 3X actual memory 
consumption.
   (The original `RecordBatch`es passing through datafusion operators don't 
share `Buffer` between different columns, but in spilling queries, 
`RecordBatch`es are first written to disk and read back, then it will reuse 
`Buffer`s among different column arrays)
   
   The root cause is already reported in `arrow-rs` 
https://github.com/apache/arrow-rs/issues/6363
   Once it's fixed in the arrow we should check if this aggregation query can 
run successfully, and also add tests.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to