On Monday, 4 June 2018 at 23:08:17 UTC, Ethan wrote:
On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer
wrote:
BTW, do you have cross-module inlining on?
Just to drive this point home.
https://run.dlang.io/is/nrdzb0
Manually implemented stride and fill with everything forced
inline. Otherwise, the original code is unchanged.
17 ms, 891 μs, and 6 hnsecs
15 ms, 694 μs, and 1 hnsec
15 ms, 570 μs, and 9 hnsecs
My new stride outperformed std.range stride, and the manual
for-loop. And, because the third test uses the new stride, it
also benefited. But interestingly runs every so slightly
faster...
Just as an aside:
...
pragma( inline ) @property length() const { return
range.length / strideCount; }
pragma( inline ) @property empty() const { return currFront >
range.length; }
pragma( inline ) @property ref Elem front() { return range[
currFront ]; }
pragma( inline ) void popFront() { currFront += strideCount; }
...
pragma( inline ) auto stride( Range )( Range r, int a )
...
pragma( inline ) auto fill( Range, Value )( Range r, Value v )
...
pragma(inline), without any argument, does not force inlining. It
actually does nothing; it just specifies that the
"implementation's default behaviour" should be used. You have to
annotate with pragma(inline, true) to force inlining
(https://dlang.org/spec/pragma.html#inline).
When I change all the pragma(inline) to pragma(inline, true),
there is a non-trivial speedup:
14 ms, 517 μs, and 9 hnsecs
13 ms, 110 μs, and 1 hnsec
13 ms, 199 μs, and 9 hnsecs
There's further reductions using ldc-beta:
14 ms, 520 μs, and 4 hnsecs
13 ms, 87 μs, and 2 hnsecs
12 ms, 938 μs, and 8 hnsecs