This is an automated email from the ASF dual-hosted git repository.

jonah pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 3bc77148c1 memory pool example (#12849)
3bc77148c1 is described below

commit 3bc77148c15c8a675c7d186c81ea54f1bcab2d42
Author: Yongting You <[email protected]>
AuthorDate: Fri Oct 11 14:44:19 2024 +0800

    memory pool example (#12849)
---
 datafusion/execution/src/memory_pool/mod.rs | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/datafusion/execution/src/memory_pool/mod.rs 
b/datafusion/execution/src/memory_pool/mod.rs
index dcd59acbd4..d87ce1ebfe 100644
--- a/datafusion/execution/src/memory_pool/mod.rs
+++ b/datafusion/execution/src/memory_pool/mod.rs
@@ -68,11 +68,35 @@ pub use pool::*;
 /// Note that a `MemoryPool` can be shared by concurrently executing plans,
 /// which can be used to control memory usage in a multi-tenant system.
 ///
+/// # How MemoryPool works by example
+///
+/// Scenario 1:
+/// For `Filter` operator, `RecordBatch`es will stream through it, so it
+/// don't have to keep track of memory usage through [`MemoryPool`].
+///
+/// Scenario 2:
+/// For `CrossJoin` operator, if the input size gets larger, the intermediate
+/// state will also grow. So `CrossJoin` operator will use [`MemoryPool`] to
+/// limit the memory usage.
+/// 2.1 `CrossJoin` operator has read a new batch, asked memory pool for
+/// additional memory. Memory pool updates the usage and returns success.
+/// 2.2 `CrossJoin` has read another batch, and tries to reserve more memory
+/// again, memory pool does not have enough memory. Since `CrossJoin` operator
+/// has not implemented spilling, it will stop execution and return an error.
+///
+/// Scenario 3:
+/// For `Aggregate` operator, its intermediate states will also accumulate as
+/// the input size gets larger, but with spilling capability. When it tries to
+/// reserve more memory from the memory pool, and the memory pool has already
+/// reached the memory limit, it will return an error. Then, `Aggregate`
+/// operator will spill the intermediate buffers to disk, and release memory
+/// from the memory pool, and continue to retry memory reservation.
+///
 /// # Implementing `MemoryPool`
 ///
 /// You can implement a custom allocation policy by implementing the
 /// [`MemoryPool`] trait and configuring a `SessionContext` appropriately.
-/// However, mDataFusion comes with the following simple memory pool 
implementations that
+/// However, DataFusion comes with the following simple memory pool 
implementations that
 /// handle many common cases:
 ///
 /// * [`UnboundedMemoryPool`]: no memory limits (the default)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to