On Monday, 4 June 2018 at 23:08:17 UTC, Ethan wrote:
On Monday, 4 June 2018 at 18:11:47 UTC, Steven Schveighoffer wrote:
BTW, do you have cross-module inlining on?

Just to drive this point home.

https://run.dlang.io/is/nrdzb0

Manually implemented stride and fill with everything forced inline. Otherwise, the original code is unchanged.

17 ms, 891 μs, and 6 hnsecs
15 ms, 694 μs, and 1 hnsec
15 ms, 570 μs, and 9 hnsecs

My new stride outperformed std.range stride, and the manual for-loop. And, because the third test uses the new stride, it also benefited. But interestingly runs every so slightly faster...

Just as an aside:

    ...
pragma( inline ) @property length() const { return range.length / strideCount; } pragma( inline ) @property empty() const { return currFront > range.length; } pragma( inline ) @property ref Elem front() { return range[ currFront ]; }
    pragma( inline ) void popFront() { currFront += strideCount; }
    ...

    pragma( inline ) auto stride( Range )( Range r, int a )
    ...

    pragma( inline ) auto fill( Range, Value )( Range r, Value v )
    ...

pragma(inline), without any argument, does not force inlining. It actually does nothing; it just specifies that the "implementation's default behaviour" should be used. You have to annotate with pragma(inline, true) to force inlining (https://dlang.org/spec/pragma.html#inline).

When I change all the pragma(inline) to pragma(inline, true), there is a non-trivial speedup:

14 ms, 517 μs, and 9 hnsecs
13 ms, 110 μs, and 1 hnsec
13 ms, 199 μs, and 9 hnsecs

There's further reductions using ldc-beta:

14 ms, 520 μs, and 4 hnsecs
13 ms, 87 μs, and 2 hnsecs
12 ms, 938 μs, and 8 hnsecs

Reply via email to