Whoops, wrong "original" version. That was the one before I
understood the game. Here mine is fixed (but up to 180 lines - 16
labels = 164 instructions):
http://goo.gl/wjIAVm
On Monday, 24 March 2014 at 17:14:11 UTC, dnspies wrote:
It seems like mine wasn't being inlined because I had
carelessly replaced char[] with const(char)[] in the function
signature (I don't know why that should make a difference, but
it does)
Going back to my original version (with Andre's trick to avoid
letting the loop unroll), I get
dnspies_orig 159 - 16 labels = 143 instructions
However, MFortin1 can be made to do better by taking successive
slices instead of indexing (and commenting out the check, since
it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions
http://goo.gl/JAgVK1
On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
On 2014-03-24 06:08:30 +0000, "safety0ff"
<[email protected]> said:
Everything seems to work in your corrected versions:
http://dpaste.dzfl.pl/3b32535559ba
Andrei's version doesn't compile on recent compiler versions
due to goto skipping initialization.
Ok, so let's check what happens in actual usage. In other
words, what happens when front is inlined and used in a loop.
I'm counting the number of assembly lines of this main
function for looping through each possible input using the
various implementations:
http://goo.gl/QjnA2k
implementation line count of main in assembly
andralex 285 - 9 labels = 276 instructions
MFortin1 135 - 13 labels = 122 instructions
MFortin2 150 - 14 labels = 136 instructions
safety0ff -- not inlined (too big?) --
dnspies 161 - 15 labels = 146 instructions
CWilliams 233 - 25 labels = 208 instructions
For compactness, my first implementation seems to be the
winner, both with and without inlining.
That said, I think front should only assume a non-empty char[]
and thus should check the length before accessing anything
after the first byte to protect against incomplete multi-byte
sequences such as [0x1000_0000]. So I added this line to my
two implementations:
if (1+tailLength >= s.length) return dchar.init;
Now lets see what happens with those changes:
http://goo.gl/XPCGYE
implementation line count of main in assembly
MFortin1Check 103 - 11 labels = 92 instructions
MFortin2Check 135 - 13 labels = 122 instructions
Now I'm baffled. Adding a test makes the code shorter? It
actually make the standalone functions longer by 8
instructions (as expected), but the inlined version is
shorter. My guess is that the optimizer creates something
entirely different and it turns out that this different
version optimises better after inlining.
That said, my test "main" function isn't very representative
of the general case because the length is statically known by
the optimizer. Let's see what it does when the length is not
statically known:
http://goo.gl/E2Q0Yu
implementation line count of main in assembly
andralex 384 - 9 labels = 375 instructions
MFortin1 173 - 19 labels = 154 instructions
MFortin2 182 - 18 labels = 164 instructions
safety0ff -- not inlined --
dnspies -- not inlined --
CWilliams 229 - 23 labels = 206 instructions
MFortin1Check 211 - 24 labels = 187 instructions
MFortin2Check 218 - 21 labels = 197 instructions
So again MFortin1 is the winner for compactness. Still, I
maintain that we ought to at least check the length of the
array for multi-byte sequences to protect against incomplete
sequences that could lay at the end of the string, so I'd
favor MFortin1Check.