Those goals are somewhat compatible.  Sasha can probably correct me if I
get this wrong but my understanding is that the minibatch is just large
enough to ensure reliable vectorized execution.  It is used in some
innermost critical sections to both keep the working set small (fit in L1)
and allocation should be avoided.

In addition to ensuring things fit in L1 there is also, I believe, a side
benefit of using small loops to increase the chances of encountering
special cases (e.g. all values null or no values null) which can sometimes
save you from more complex logic.

On Tue, Jun 20, 2023 at 7:32 PM Ruoxi Sun <zanmato1...@gmail.com> wrote:

> Hi,
>
> By looking at acero code, I'm curious about the concept `minibatch` being
> used in swiss join and grouper.
> I wonder if its purpose is to proactively limit the memory size of the
> working set? Or is it the consequence of that the temp vector should be
> fix-sized (to avoid costly memory allocation)? Additionally, what's the
> impact of choosing the size of the minibatch?
>
> Really appreciate if someone can help me to clear this.
>
> Thanks.
>
> *Rossi*
>

Reply via email to