In Phobos (especially in std.algorithm) a lot of specialization between a RandomAccessRange and an InputRange are used even though only an InputRange is required. The reason for this is solely performance - imho we should start to tackle this and remove such specializations. Let's create better code!

To quote David from the recent PR (https://github.com/dlang/phobos/pull/4265) where adding a identity specialization achieves a 5-10x speedup for {min,max}Element.

I think that we have a larger issue at hand here. Ranges are of prime strategic importance for D as an accessible way of composing complex operations out of reusable primitives. A large part of this comes from the fact that they are supposed to be zero-cost abstractions – after the compiler optimiser has had its way with a piece of code, we expect it to be equivalent to a hand-written loop.

1) Should it matter whether generic range looping or "specialized" r[i] access is used?

Answer for dmd it does (7x slower), ldc does the job (nearly) correctly.

```
dmd -release -O -boundscheck=off test_looping.d && ./test_looping
random_access 685 ms, 351 μs, and 9 hnsecs
foreach       685 ms, 888 μs, and 4 hnsecs
foreach_ref   685 ms, 654 μs, and 4 hnsecs
for_range_api 7 secs, 211 ms, 530 μs, and 3 hnsecs

```
ldc -release -O3 -boundscheck=off test_looping.d && ./test_looping
random_access 142 ms and 88 μs
foreach       142 ms, 8 μs, and 3 hnsecs
foreach_ref   142 ms, 400 μs, and 9 hnsecs
for_range_api 143 ms, 394 μs, and 1 hnsec
```

Benchmarking code: https://github.com/wilzbach/perf-ranges/blob/master/test_looping.d
Bug report for DMD: https://issues.dlang.org/show_bug.cgi?id=16120

(Tested with both ldc 0.17 and 1.0)

2) Should it matter whether I use r.save or r[i..$]?

Answer: it does for dmd and ldc.

```
dmd -release -O -boundscheck=off test_save.d && ./test_save
random_access  424 ms and 72 μs
foreach_random 420 ms, 754 μs, and 5 hnsecs
for_range_api  2 secs, 562 ms, 415 μs, and 5 hnsecs

ldc -release -O3 -boundscheck=off test_save.d && ./test_save
random_access  245 ms, 713 μs, and 4 hnsecs
foreach_random 241 ms, 533 μs, and 1 hnsec
for_range_api  349 ms, 582 μs, and 9 hnsecs
```

Benchmarking code: https://github.com/wilzbach/perf-ranges/blob/master/test_save.d

Open question: how can we make DMD/LDC smarter, so that they inline range primitives and the promise about the zero-cost abstractions is full-filled?

Reply via email to