Re: cpp Memory Pool Clarification

Li Jin Mon, 11 Jul 2022 14:04:46 -0700

> TableSourceNode wouldn't need to allocate since it runs against memory
that's already been allocated.
Is the memory "that is already allocated" tracked in any allocators? For an
end to end benchmark of "scan - join - write" I think would make sense to
include all arrow memory allocation (if that makes sense)


On Mon, Jul 11, 2022 at 4:37 PM Weston Pace <[email protected]> wrote:

> > Is there anything else I'd need to change?
>
> Maybe try something like this:
>
> https://github.com/westonpace/arrow/commit/15ac0d051136c585cda63297e48f17557808d898
>
> > Beyond that, we should also expect to see some allocations from
> TableSourceNode going through the logging memory pool, even if AsOfJoinNode
> was using the default memory pool instead of the Exec Plan's pool, but I am
> not seeing anything come through...
>
> TableSourceNode wouldn't need to allocate since it runs against memory
> that's already been allocated.  It might split input into smaller
> batches but slicing tables / arrays is a zero-copy operation that does
> not require allocating new buffers.
>
> On Mon, Jul 11, 2022 at 12:46 PM Ivan Chau <[email protected]> wrote:
> >
> > Yeah this behavior is certainly a bit strange then.
> >
> > The only alteration I am making is changing the way we create the
> Execution Context in the benchmark file.
> >
> > Something like:
> >
> > ```
> > auto logging_pool = LoggingMemoryPool(default_memory_pool());
> > ExecContext ctx(&logging_pool, ...);
> > ```
> >
> > Is there anything else I'd need to change?
> >
> > Beyond that, we should also expect to see some allocations from
> TableSourceNode going through the logging memory pool, even if AsOfJoinNode
> was using the default memory pool instead of the Exec Plan's pool, but I am
> not seeing anything come through...
> >
> > -----Original Message-----
> > From: Weston Pace <[email protected]>
> > Sent: Monday, July 11, 2022 2:47 PM
> > To: [email protected]
> > Subject: Re: cpp Memory Pool Clarification
> >
> > Are you changing the default memory pool to a LoggingMemoryPool?
> > Where are you doing this?  For a benchmark I think you would need to
> change the implementation in the benchmark file itself.
> >
> > Similarly, is AsofJoinNode using the default memory pool or the memory
> pool of the exec plan?  It should be exclusively using the latter but it's
> easy sometimes to overlook using the default memory pool.  It probably
> won't make too much of a difference at the end of the day as benchmarks
> normally configure an exec plan to use the default memory pool and so the
> two pools would be the same.
> >
> > > My expectation is that we would see some pretty sizable calls to
> Allocate when we begin to read files or to create tables, but that is not
> evident.
> >
> > Yes, the materializtion step of an asof join uses array builders and
> those will be allocating buffers from a memory pool.
> >
> > > 1) To my understanding, only large allocations will call Allocate. Are
> > > there allocations (for files, table objects), which despite being of
> > > large size, do not call Allocate?
> >
> > No.  There is no size limit for the allocator.  Instead, when people
> were talking about "large allocations" and "small allocations" in the
> previous thread is was more of a general concept.
> >
> > For example, if I create an array builder, add some items to it, and
> then create an array then this will always use a memory pool for the
> allocation.  This will be true even if I create an array with a single
> element in it (in which case the allocation is often padded for alignment
> purposes).
> >
> > On the other hand, schemas keep their fields in a std::vector which
> never uses the memory pool for allocation.  This is true even if I have
> 10,000 columns and the vector's memory is actually quite large.
> >
> > However, in general, arrays tend to be quite large and schemas tend to
> be quite small.
> >
> > > 2) How can maximum_peak_memory be nonzero if we have not seen any
> > > calls to Allocate/Reallocate/Free?
> >
> > I don't think that is possible.
> >
> > On Mon, Jul 11, 2022 at 10:44 AM Ivan Chau <[email protected]>
> wrote:
> > >
> > > Hi all,
> > >
> > > I've been doing some testing with LoggingMemoryPool to benchmark our
> > > AsOfJoin implementation
> > > <
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/asof_join_node.cc
> >.
> > > Our underlying memory pool for the LoggingMemoryPool is the
> > > default_memory_pool (this is process-wide).
> > >
> > > Curiously enough, I don't see any allocations, reallocations, or frees
> > > when we run our benchmarking code. I also see that the max_memory
> > > property of the memory pool (which is documented as the peak memory
> > > allocation), is nonzero (1.2e9 bytes).
> > >
> > > My expectation is that we would see some pretty sizable calls to
> > > Allocate when we begin to read files or to create tables, but that is
> not evident.
> > >
> > > 1) To my understanding, only large allocations will call Allocate. Are
> > > there allocations (for files, table objects), which despite being of
> > > large size, do not call Allocate?
> > >
> > > 2) How can maximum_peak_memory be nonzero if we have not seen any
> > > calls to Allocate/Reallocate/Free?
> > >
> > > Thank you!
>

Re: cpp Memory Pool Clarification

Reply via email to