On Monday, 3 August 2015 at 16:50:42 UTC, John Colvin wrote:
Making SubFoo a final class and test take SubFoo gives a >10x speedup for me.

Right, gdc and ldc will the the aggressive inlining and local data optimizations automatically once it is able to devirtualize the calls (at least when you use the -O flags).

dmd, however, even with -inline, doesn't make the local copy of the variable - it disassembles to this:

08098740 <_D1l4testFC1l6SubFooiZi>:
 8098740:       55                      push   ebp
 8098741:       8b ec                   mov    ebp,esp
 8098743:       89 c1                   mov    ecx,eax
 8098745:       53                      push   ebx
 8098746:       31 d2                   xor    edx,edx
8098748: 8b 5d 08 mov ebx,DWORD PTR [ebp+0x8]
 809874b:       56                      push   esi
 809874c:       85 c9                   test   ecx,ecx
809874e: 7e 0f jle 809875f <_D1l4testFC1l6SubFooiZi+0x1f> 8098750: 8b 43 08 mov eax,DWORD PTR [ebx+0x8]
 8098753:       8d 74 40 01             lea    esi,[eax+eax*2+0x1]
 8098757:       42                      inc    edx
8098758: 89 73 08 mov DWORD PTR [ebx+0x8],esi
 809875b:       39 ca                   cmp    edx,ecx
809875d: 7c f1 jl 8098750 <_D1l4testFC1l6SubFooiZi+0x10> 809875f: 8b 43 08 mov eax,DWORD PTR [ebx+0x8]
 8098762:       5e                      pop    esi
 8098763:       5b                      pop    ebx
 8098764:       5d                      pop    ebp
 8098765:       c2 04 00                ret    0x4



There's no call in there, but there is still indirect memory access for the variable, so it doesn't get the caching benefits of the stack.



It isn't news that dmd's optimizer is pretty bad next to.... well, pretty much everyone else nowdays, whether gdc, ldc, or Java, but it is sometimes nice to take a look at why.



The biggest magic of Java IMO here is being CPU cache friendly!

Reply via email to