It seems like mine wasn't being inlined because I had carelessly
replaced char[] with const(char)[] in the function signature (I
don't know why that should make a difference, but it does)
Going back to my original version (with Andre's trick to avoid
letting the loop unroll), I get
dnspies_orig 159 - 16 labels = 143 instructions
However, MFortin1 can be made to do better by taking successive
slices instead of indexing (and commenting out the check, since
it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions
http://goo.gl/JAgVK1
On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
On 2014-03-24 06:08:30 +0000, "safety0ff"
<[email protected]> said:
Everything seems to work in your corrected versions:
http://dpaste.dzfl.pl/3b32535559ba
Andrei's version doesn't compile on recent compiler versions
due to goto skipping initialization.
Ok, so let's check what happens in actual usage. In other
words, what happens when front is inlined and used in a loop.
I'm counting the number of assembly lines of this main function
for looping through each possible input using the various
implementations:
http://goo.gl/QjnA2k
implementation line count of main in assembly
andralex 285 - 9 labels = 276 instructions
MFortin1 135 - 13 labels = 122 instructions
MFortin2 150 - 14 labels = 136 instructions
safety0ff -- not inlined (too big?) --
dnspies 161 - 15 labels = 146 instructions
CWilliams 233 - 25 labels = 208 instructions
For compactness, my first implementation seems to be the
winner, both with and without inlining.
That said, I think front should only assume a non-empty char[]
and thus should check the length before accessing anything
after the first byte to protect against incomplete multi-byte
sequences such as [0x1000_0000]. So I added this line to my two
implementations:
if (1+tailLength >= s.length) return dchar.init;
Now lets see what happens with those changes:
http://goo.gl/XPCGYE
implementation line count of main in assembly
MFortin1Check 103 - 11 labels = 92 instructions
MFortin2Check 135 - 13 labels = 122 instructions
Now I'm baffled. Adding a test makes the code shorter? It
actually make the standalone functions longer by 8 instructions
(as expected), but the inlined version is shorter. My guess is
that the optimizer creates something entirely different and it
turns out that this different version optimises better after
inlining.
That said, my test "main" function isn't very representative of
the general case because the length is statically known by the
optimizer. Let's see what it does when the length is not
statically known:
http://goo.gl/E2Q0Yu
implementation line count of main in assembly
andralex 384 - 9 labels = 375 instructions
MFortin1 173 - 19 labels = 154 instructions
MFortin2 182 - 18 labels = 164 instructions
safety0ff -- not inlined --
dnspies -- not inlined --
CWilliams 229 - 23 labels = 206 instructions
MFortin1Check 211 - 24 labels = 187 instructions
MFortin2Check 218 - 21 labels = 197 instructions
So again MFortin1 is the winner for compactness. Still, I
maintain that we ought to at least check the length of the
array for multi-byte sequences to protect against incomplete
sequences that could lay at the end of the string, so I'd favor
MFortin1Check.