alamb opened a new issue #898: URL: https://github.com/apache/arrow-datafusion/issues/898
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When running DataFusion as part of a Rust program that has other substantial uses of memory (for example Buffers in IOx) we would like to know how much memory is allocated when running DataFusion plans so we can: 1. Allocate sufficient Memory to DataFusion / limit other users of memory 2. Start turning / working to limit memory usage by DataFusion (e.g. #587 ) **Describe the solution you'd like** A counter (perhaps `AtomicUsize` tied to the ExecutionContext somehow) that tracks, across all DataFusion plans running in that context, how much memory has been allocated. This counter's value should be available both during the plan execution as well as after it has completed. The counter should include: 1. Memory allocated in RecordBatches *created* by DataFusion operators 2. Memory used in intermediate buffers (e.g. HashTables, Sort buffers, etc) - should be "capacity" rather than "size" to reflect the heap usage of the program 3. Decremented when memory is deallocated Initially, a counter that gets the major allocations of memory would be idea. **Describe alternatives you've considered** Implement a per-operator allocation tracking scheme (perhaps based on metrics, see #866 and https://github.com/apache/arrow-datafusion/issues/679). I think a per-operator tracking of memory is also valuable and will file a separate ticket for that capability **Additional context** This is likely a pre-requisite for actually limiting memory usage for DataFusion plans as described in https://github.com/apache/arrow-datafusion/issues/587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
