Thanks Weston, that makes a lot of sense. Please let me rephrase to make
sure I get this right.

So the main purpose of minibatch is actually about keeping the working set
within L1 (in addition with the side benefit of more chances to shortcut).
This requires splitting the input batch into minibatches. And this
minibatch is fix-sized, plus we want to avoid allocation, so it is
desirable to reuse a preallocated buffer across multiple minibatches. The
assumption in my original question about limiting the memory size of the
working set is not the main consideration but another possible side
benefit, i.e., comparing with having to calculate hashes for the whole
input batch, for example, for hash join?

*Rossi*


Weston Pace <weston.p...@gmail.com> 于2023年6月21日周三 12:26写道:

> Those goals are somewhat compatible.  Sasha can probably correct me if I
> get this wrong but my understanding is that the minibatch is just large
> enough to ensure reliable vectorized execution.  It is used in some
> innermost critical sections to both keep the working set small (fit in L1)
> and allocation should be avoided.
>
> In addition to ensuring things fit in L1 there is also, I believe, a side
> benefit of using small loops to increase the chances of encountering
> special cases (e.g. all values null or no values null) which can sometimes
> save you from more complex logic.
>
> On Tue, Jun 20, 2023 at 7:32 PM Ruoxi Sun <zanmato1...@gmail.com> wrote:
>
> > Hi,
> >
> > By looking at acero code, I'm curious about the concept `minibatch` being
> > used in swiss join and grouper.
> > I wonder if its purpose is to proactively limit the memory size of the
> > working set? Or is it the consequence of that the temp vector should be
> > fix-sized (to avoid costly memory allocation)? Additionally, what's the
> > impact of choosing the size of the minibatch?
> >
> > Really appreciate if someone can help me to clear this.
> >
> > Thanks.
> >
> > *Rossi*
> >
>

Reply via email to