On Wednesday, 14 May 2014 at 06:00:33 UTC, Ola Fosheim Grøstad
wrote:
Besides, D ranges will never perform as well as an optimized
explicit loop, so you might as well aim for usability over
speed.
There is no single reason why this should be true. Actually
ranges of medium complexity are already very close to explicit
loops in D when you use something like LDC.
Consider sample program like this:
1 void main()
2 {
3 import std.stdio;
4 int max;
5 readf("%d", &max);
6 assert(foo1(max) == foo2(max));
7 }
8
9 int foo1(int max)
10 {
11 int sum = 0;
12 for (auto i = 0; i < max; ++i)
13 {
14 sum += i*2;
15 }
16 return sum;
17 }
18
19 int foo2(int max)
20 {
21 import std.algorithm : reduce, map;
22 import std.range : iota;
23 return reduce!((a, b) => a + b)(0, iota(0, max).map!(a =>
a*2));
24 }
Disassembly (ldmd2 -O -release -inline):
765 0000000000402df0 <_D4test4foo1FiZi>:
766 402df0: 31 c0 xor %eax,%eax
767 402df2: 85 ff test %edi,%edi
768 402df4: 7e 10 jle 402e06
<_D4test4foo1FiZi+0x16>
769 402df6: 8d 47 ff lea -0x1(%rdi),%eax
770 402df9: 8d 4f fe lea -0x2(%rdi),%ecx
771 402dfc: 0f af c8 imul %eax,%ecx
772 402dff: 83 e1 fe and $0xfffffffe,%ecx
773 402e02: 8d 44 79 fe lea
-0x2(%rcx,%rdi,2),%eax
774 402e06: c3 retq
775 402e07: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
776 402e0e: 00 00
777
778 0000000000402e10 <_D4test4foo2FiZi>:
779 402e10: 31 c0 xor %eax,%eax
780 402e12: 85 ff test %edi,%edi
781 402e14: 0f 48 f8 cmovs %eax,%edi
782 402e17: 85 ff test %edi,%edi
783 402e19: 74 10 je 402e2b
<_D4test4foo2FiZi+0x1b>
784 402e1b: 8d 47 ff lea -0x1(%rdi),%eax
785 402e1e: 8d 4f fe lea -0x2(%rdi),%ecx
786 402e21: 0f af c8 imul %eax,%ecx
787 402e24: 83 e1 fe and $0xfffffffe,%ecx
788 402e27: 8d 44 79 fe lea
-0x2(%rcx,%rdi,2),%eax
789 402e2b: c3 retq
790 402e2c: 0f 1f 40 00 nopl 0x0(%rax)
it is almost identical. I fully expect to be 100% identical at
certain point of compiler maturity.