RE: cpp Memory Pool Clarification

Ivan Chau Mon, 11 Jul 2022 12:46:42 -0700

Yeah this behavior is certainly a bit strange then.

The only alteration I am making is changing the way we create the Execution 
Context in the benchmark file.

Something like:

```
auto logging_pool = LoggingMemoryPool(default_memory_pool());
ExecContext ctx(&logging_pool, ...);
```

Is there anything else I'd need to change?

Beyond that, we should also expect to see some allocations from TableSourceNode 
going through the logging memory pool, even if AsOfJoinNode was using the 
default memory pool instead of the Exec Plan's pool, but I am not seeing 
anything come through...

-----Original Message-----
From: Weston Pace <[email protected]>
Sent: Monday, July 11, 2022 2:47 PM
To: [email protected]
Subject: Re: cpp Memory Pool Clarification

Are you changing the default memory pool to a LoggingMemoryPool?
Where are you doing this?  For a benchmark I think you would need to change the 
implementation in the benchmark file itself.

Similarly, is AsofJoinNode using the default memory pool or the memory pool of 
the exec plan?  It should be exclusively using the latter but it's easy 
sometimes to overlook using the default memory pool.  It probably won't make 
too much of a difference at the end of the day as benchmarks normally configure 
an exec plan to use the default memory pool and so the two pools would be the 
same.

> My expectation is that we would see some pretty sizable calls to Allocate 
> when we begin to read files or to create tables, but that is not evident.

Yes, the materializtion step of an asof join uses array builders and those will 
be allocating buffers from a memory pool.

> 1) To my understanding, only large allocations will call Allocate. Are
> there allocations (for files, table objects), which despite being of
> large size, do not call Allocate?

No.  There is no size limit for the allocator.  Instead, when people were 
talking about "large allocations" and "small allocations" in the previous 
thread is was more of a general concept.

For example, if I create an array builder, add some items to it, and then 
create an array then this will always use a memory pool for the allocation.  
This will be true even if I create an array with a single element in it (in 
which case the allocation is often padded for alignment purposes).

On the other hand, schemas keep their fields in a std::vector which never uses 
the memory pool for allocation.  This is true even if I have 10,000 columns and 
the vector's memory is actually quite large.

However, in general, arrays tend to be quite large and schemas tend to be quite 
small.

> 2) How can maximum_peak_memory be nonzero if we have not seen any
> calls to Allocate/Reallocate/Free?

I don't think that is possible.

On Mon, Jul 11, 2022 at 10:44 AM Ivan Chau <[email protected]> wrote:
>
> Hi all,
>
> I've been doing some testing with LoggingMemoryPool to benchmark our
> AsOfJoin implementation
> <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/asof_join_node.cc>.
> Our underlying memory pool for the LoggingMemoryPool is the
> default_memory_pool (this is process-wide).
>
> Curiously enough, I don't see any allocations, reallocations, or frees
> when we run our benchmarking code. I also see that the max_memory
> property of the memory pool (which is documented as the peak memory
> allocation), is nonzero (1.2e9 bytes).
>
> My expectation is that we would see some pretty sizable calls to
> Allocate when we begin to read files or to create tables, but that is not 
> evident.
>
> 1) To my understanding, only large allocations will call Allocate. Are
> there allocations (for files, table objects), which despite being of
> large size, do not call Allocate?
>
> 2) How can maximum_peak_memory be nonzero if we have not seen any
> calls to Allocate/Reallocate/Free?
>
> Thank you!

RE: cpp Memory Pool Clarification

Reply via email to