On Sunday, 5 October 2014 at 21:53:23 UTC, Ali Çehreli wrote:
On 10/05/2014 02:40 PM, Sativa wrote:

>      foreach(i; thds) { ulong s = 0; for(ulong k = 0; k <
> iter/numThreads; k++)

The for loop condition is executed at every iteration and division is an expensive operation. Apparently, the compiled does some optimization when the divisor is known at compile time.

Being 4, it is just a shift of 2 bits. Try something like 5, it is slow even for enum.

This solves the problem:

        const end = iter/numThreads;

        for(ulong k = 0; k < end; k++) {

Ali

Yes, it is a common problem when doing a computation in a for loop on the bounds. Most of the time they are constant for the loop but the compiler computes it every iteration. When doing a simple sum(when the loop does not do much), it becomes expensive since it is comparable to what is happening inside the loop.

It's surprising just how slow it makes it though. One can't really make numThreads const in the real world though as it wouldn't optimal(unless one had a version for each number of possible threads).

Obviously one can just move the computation outside the loop. I would expect better results if the loops actually did some real work.


Reply via email to