Whoops, wrong "original" version. That was the one before I understood the game. Here mine is fixed (but up to 180 lines - 16 labels = 164 instructions):

http://goo.gl/wjIAVm

On Monday, 24 March 2014 at 17:14:11 UTC, dnspies wrote:
It seems like mine wasn't being inlined because I had carelessly replaced char[] with const(char)[] in the function signature (I don't know why that should make a difference, but it does)

Going back to my original version (with Andre's trick to avoid letting the loop unroll), I get
dnspies_orig   159 - 16 labels = 143 instructions

However, MFortin1 can be made to do better by taking successive slices instead of indexing (and commenting out the check, since it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions

http://goo.gl/JAgVK1

On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
On 2014-03-24 06:08:30 +0000, "safety0ff" <[email protected]> said:

Everything seems to work in your corrected versions:
http://dpaste.dzfl.pl/3b32535559ba

Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization.

Ok, so let's check what happens in actual usage. In other words, what happens when front is inlined and used in a loop. I'm counting the number of assembly lines of this main function for looping through each possible input using the various implementations:

http://goo.gl/QjnA2k

implementation   line count of main in assembly
andralex         285 -  9 labels = 276 instructions
MFortin1         135 - 13 labels = 122 instructions
MFortin2         150 - 14 labels = 136 instructions
safety0ff         -- not inlined (too big?) --
dnspies          161 - 15 labels = 146 instructions
CWilliams        233 - 25 labels = 208 instructions

For compactness, my first implementation seems to be the winner, both with and without inlining.

That said, I think front should only assume a non-empty char[] and thus should check the length before accessing anything after the first byte to protect against incomplete multi-byte sequences such as [0x1000_0000]. So I added this line to my two implementations:

  if (1+tailLength >= s.length) return dchar.init;

Now lets see what happens with those changes:

http://goo.gl/XPCGYE

implementation   line count of main in assembly
MFortin1Check    103 - 11 labels =  92 instructions
MFortin2Check    135 - 13 labels = 122 instructions

Now I'm baffled. Adding a test makes the code shorter? It actually make the standalone functions longer by 8 instructions (as expected), but the inlined version is shorter. My guess is that the optimizer creates something entirely different and it turns out that this different version optimises better after inlining.

That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known:

http://goo.gl/E2Q0Yu

implementation   line count of main in assembly
andralex         384 -  9 labels = 375 instructions
MFortin1         173 - 19 labels = 154 instructions
MFortin2         182 - 18 labels = 164 instructions
safety0ff         -- not inlined --
dnspies           -- not inlined --
CWilliams        229 - 23 labels = 206 instructions
MFortin1Check    211 - 24 labels = 187 instructions
MFortin2Check    218 - 21 labels = 197 instructions

So again MFortin1 is the winner for compactness. Still, I maintain that we ought to at least check the length of the array for multi-byte sequences to protect against incomplete sequences that could lay at the end of the string, so I'd favor MFortin1Check.

Reply via email to