What we're seeing here is pretty much the same problem that early c++ suffered from: abstraction penalty. It took years of work to help overcome it, both from the compiler and the library. Not having trivial functions inlined and optimized down through standard techniques like dead store elimination, value range propagation, various loop restructurings, etc means that code will look like what Walter and you have shown. Given DMD's relatively weak inliner, I'm not shocked by Walter's example. I am curious why ldc failed to inline those functions.

On 9/27/2014 2:59 PM, Peter Alexander via Digitalmars-d wrote:
On Saturday, 27 September 2014 at 20:57:53 UTC, Walter Bright wrote:
From time to time, I take a break from bugs and enhancements and just
look at what some piece of code is actually doing. Sometimes, I'm
appalled.

Me too, and yes it can be appalling. It's pretty bad for even simple
range chains, e.g.

import std.algorithm, std.stdio;
int main(string[] args) {
   return cast(int)args.map!("a.length").reduce!"a+b"();
}

Here's what LDC produces (with -O -inline -release -noboundscheck)

__Dmain:
0000000100001480    pushq    %r15
0000000100001482    pushq    %r14
0000000100001484    pushq    %rbx
0000000100001485    movq    %rsi, %rbx
0000000100001488    movq    %rdi, %r14
000000010000148b    callq    0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
0000000100001490    xorb    $0x1, %al
0000000100001492    movzbl    %al, %r9d
0000000100001496    leaq    _.str12(%rip), %rdx ## literal pool for:
"/Users/pja/ldc2-0.14.0-osx-x86_64/bin/../import/std/algorithm.d"
000000010000149d    movq    0xcbd2c(%rip), %r8 ## literal pool symbol
address:
__D3std9algorithm24__T6reduceVAyaa3_612b62Z124__T6reduceTS3std9algorithm85__T9MapResultS633std10functional36__T8unaryFunVAyaa8_612e6c656e677468Z8unaryFunTAAyaZ9MapResultZ6reduceFNaNfS3std9algorithm85__T

00000001000014a4    movl    $0x2dd, %edi
00000001000014a9    movl    $0x3f, %esi
00000001000014ae    xorl    %ecx, %ecx
00000001000014b0    callq    0x10006e0a2 ## symbol stub for:
__D3std9exception14__T7enforceTbZ7enforceFNaNfbLAxaAyamZb
00000001000014b5    movq    (%rbx), %r15
00000001000014b8    leaq    0x10(%rbx), %rsi
00000001000014bc    leaq    -0x1(%r14), %rdi
00000001000014c0    callq    0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014c5    testb    $0x1, %al
00000001000014c7    jne    0x1000014fa
00000001000014c9    addq    $-0x2, %r14
00000001000014cd    addq    $0x20, %rbx
00000001000014d1    nopw    %cs:(%rax,%rax)
00000001000014e0    addq    -0x10(%rbx), %r15
00000001000014e4    movq    %r14, %rdi
00000001000014e7    movq    %rbx, %rsi
00000001000014ea    callq    0x10006df10 ## symbol stub for:
__D3std5array14__T5emptyTAyaZ5emptyFNaNbNdNfxAAyaZb
00000001000014ef    decq    %r14
00000001000014f2    addq    $0x10, %rbx
00000001000014f6    testb    $0x1, %al
00000001000014f8    je    0x1000014e0
00000001000014fa    movl    %r15d, %eax
00000001000014fd    popq    %rbx
00000001000014fe    popq    %r14
0000000100001500    popq    %r15
0000000100001502    ret

and for:

import std.algorithm, std.stdio;
int main(string[] args) {
   int r = 0;
   foreach (i; 0..args.length)
     r += args[i].length;
   return r;
}

__Dmain:
00000001000015c0    xorl    %eax, %eax
00000001000015c2    testq    %rdi, %rdi
00000001000015c5    je    0x1000015de
00000001000015c7    nopw    (%rax,%rax)
00000001000015d0    movl    %eax, %eax
00000001000015d2    addq    (%rsi), %rax
00000001000015d5    addq    $0x10, %rsi
00000001000015d9    decq    %rdi
00000001000015dc    jne    0x1000015d0
00000001000015de    ret

(and sorry, don't even bother looking at what dmd does...)

I'm not complaining about LDC here (although I'm surprised array.empty
isn't inlined). The way ranges are formulated make them difficult to
optimize. I think there's things we can do here in the library. Maybe
I'll write up something about that at some point.

I think the takeaway here is that people should be aware of (a) what
kind of instructions their code is generating, (b) what kind of
instructions their code SHOULD be generating, and (c) what is
practically possible for present-day compilers. Like you say, it helps
to look at the assembled code once in a while to get a feel for this
kind of thing. Modern compilers are good, but they aren't magic.

Reply via email to