hi Antoine, On Tue, Apr 21, 2020 at 4:54 AM Antoine Pitrou <anto...@python.org> wrote: > > > Le 21/04/2020 à 11:13, Antoine Pitrou a écrit : > >
> It would be interesting to know how costly repeated > allocation/deallocation is. Modern allocators like jemalloc do their > own caching instead of always returning memory to the system. We could > also have our own caching layer. I'm sure that we can do our own experiments. The more significant issue based on my understanding of the academic research is actually memory bandwidth relating to CPU memory hierarchies. This is discussed at length in the X100 paper so I'll direct you to the research rather than debate about it here. Since that paper was written in 2005, you might also look for papers in the last 15 years that have this paper as a reference to see if there has been follow-on research as CPU architectures have evolved. > > This assumes that all these kernels can safely write into one of their > > inputs. This should be true for trivial ones, but not if e.g. a kernel > > makes two passes over its input. For example, the SortToIndices kernel > > first scans the input for min and max values, and then switches on two > > different sorting algorithms depending on those statistics (using a O(n) > > counting sort if the values are in a small enough range). > > That said, in the SortToIndices case, this wouldn't be a problem, since > only the second pass writes to the output. This kernel is not valid for normal array-exprs (see the spreadsheet I linked), such as what you can write in SQL Kernels like SortToIndices are a different type of function (in other words, "not a SQL function") and so if we choose to allow such a "non-SQL-like" functions in the expression evaluator then different logic must be used. > Regards > > Antoine.