Thanks! That eliminated the extra allocations. But I'm not understanding why that works. The individual functions are not using any global variables, the array I'm passing has a concrete type, so why would the function be compiled with a heap allocated accumulator at all?
Also, would this mean that ultimately in order to have performant code, I would need to wrap everything in a main function that does not take in any arguments? That makes it a little tricky to incrementally test and profile the code. On Thursday, May 19, 2016 at 11:22:04 PM UTC-7, Kristoffer Carlsson wrote: > > Do your timings in a function.
