On 06/04/2018 07:08 PM, Ethan wrote:
On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer wrote:
BTW, do you have cross-module inlining on?

Just to drive this point home.

https://run.dlang.io/is/nrdzb0

Manually implemented stride and fill with everything forced inline. Otherwise, the original code is unchanged.

17 ms, 891 μs, and 6 hnsecs
15 ms, 694 μs, and 1 hnsec
15 ms, 570 μs, and 9 hnsecs

My new stride outperformed std.range stride, and the manual for-loop. And, because the third test uses the new stride, it also benefited. But interestingly runs every so slightly faster...

BTW I've had this thought for a long time to implement stride with a compile-time step... never got around to implementing it. It would easily generalize the existing code without too much work. Essentially the step would be a template parameter; if that is 0, then use a run-time stride. Most of the code works unchanged.

Reply via email to