Re: Errors in TDPL

bearophile Mon, 21 Jun 2010 18:30:15 -0700

Andrei Alexandrescu:

>That's not too difficult; for integers, retro(iota(a, b)) could actually be a 
>rewrite to iota(b - 1, a, -1).<


This is good. In my dlibs the xchain converts xchain(xchain(x, y), z) and 
similar into xchain(x, y, z).


>Figuring out all corner cases, steps greater than 1, and what to do for 
>floating point numbers is doable but not trivial either, and works against 
>modularity.<

I suggest to focus on the most important case, integer/uint indexes (and +1 or 
-1 increment).

My post has shown some different problems:
- A longer compilation time (and binary size)
- A 1/10 performance when the code is compiled in standard way, this is bad
- a smaller performance when the code is compiled in optimized mode.

The asm of the opt version shows calls to two or more functions inside the 
loop, and one of those functions is not so small, this probably reduces 
performance more than an extra inlined product. So in my opinion getting rid of 
those calls (inlining them) is more important if you want a faster retro(iota).

------------------------

I have added your last version:

// test4
import std.c.stdio: printf;
import std.range: iota;
void main() {
    enum int N = 100_000_000;
    int count;
    foreach (i; iota(N - 1, 0, -1))
        count++;
    printf("%d\n", count);
}



Running time, best of 3, seconds:
  test1:     0.31
  test1 opt: 0.07
  test2:     0.31
  test2 opt: 0.12
  test3:     6.38
  test3 opt: 0.52
  test4:     4.70
  test4 opt: 1.25

not opt version = dmd (no other option)
opt version = dmd -O -release -inline

Compile times opt version, seconds:
  test1: 0.05
  test2: 0.05
  test3: 0.28
  test4: 0.29

The compilation time is the same, the not-opt test4 is faster than not-opt 
test3, but opt test4 is quite slower than opt test3.

Bye,
bearophile

Re: Errors in TDPL

Reply via email to