Don wrote:
Multidimensional slices normally result in appallingly inefficient use of caches.
Indeed, cache usage is a challenge. My general approach would be fairly conservative: give the user full control over memory layout, but do this as comfortably as possible. Provide good performance for straightforward code but allow the user to tweak the details to improve performance
A library function that takes several arrays as input and output should allow arbitrary memory layouts, but it should also specify which memory layout is most efficient.
In any case, I think that the expressiveness of multidimensional slices is worth having them even if the performance is not optimal in every case with the first generation of libraries.
