Those goals are somewhat compatible. Sasha can probably correct me if I get this wrong but my understanding is that the minibatch is just large enough to ensure reliable vectorized execution. It is used in some innermost critical sections to both keep the working set small (fit in L1) and allocation should be avoided.
In addition to ensuring things fit in L1 there is also, I believe, a side benefit of using small loops to increase the chances of encountering special cases (e.g. all values null or no values null) which can sometimes save you from more complex logic. On Tue, Jun 20, 2023 at 7:32 PM Ruoxi Sun <zanmato1...@gmail.com> wrote: > Hi, > > By looking at acero code, I'm curious about the concept `minibatch` being > used in swiss join and grouper. > I wonder if its purpose is to proactively limit the memory size of the > working set? Or is it the consequence of that the temp vector should be > fix-sized (to avoid costly memory allocation)? Additionally, what's the > impact of choosing the size of the minibatch? > > Really appreciate if someone can help me to clear this. > > Thanks. > > *Rossi* >