On Tuesday, 6 May 2014 at 15:56:11 UTC, Kapps wrote:
On Monday, 5 May 2014 at 22:11:39 UTC, Ali Çehreli wrote:
On 05/05/2014 02:38 PM, Kapps wrote:

> I think that the GC actually blocks when
> creating objects, and thus multiple threads creating
instances would not
> provide a significant speedup, possibly even a slowdown.

Wow! That is the case. :)

> You'd want to benchmark this to be certain it helps.

I did:

import std.range;
import std.parallelism;

class C

void foo()
   auto c = new C;

void main(string[] args)
   enum totalElements = 10_000_000;

   if (args.length > 1) {
       foreach (i; iota(totalElements).parallel) {

   } else {
       foreach (i; iota(totalElements)) {

Typical run on my system for "-O -noboundscheck -inline":

$ time ./deneme parallel

real    0m4.236s
user    0m4.325s
sys     0m9.795s

$ time ./deneme

real    0m0.753s
user    0m0.748s
sys     0m0.003s


Huh, that's a much, much, higher impact than I'd expected.
I tried with GDC as well (the one in Debian stable, which is unfortunately still 2.055...) and got similar results. I also tried creating only totalCPUs threads and having each of them create NUM_ELEMENTS / totalCPUs objects rather than risking that each creation was a task, and it still seems to be the same.


I tried with using an allocator that never releases memory, rounds up to a power of 2, and is lock-free. The results are quite a bit better.

shardsoft:~$ ./test
1 sec, 47 ms, 474 μs, and 4 hnsecs
shardsoft:~$ ./test
1 sec, 43 ms, 588 μs, and 2 hnsecs
shardsoft:~$ ./test tasks
692 ms, 769 μs, and 8 hnsecs
shardsoft:~$ ./test tasks
692 ms, 686 μs, and 8 hnsecs
shardsoft:~$ ./test parallel
691 ms, 856 μs, and 9 hnsecs
shardsoft:~$ ./test parallel
690 ms, 22 μs, and 3 hnsecs

I get similar results on my laptop (which is much faster than the results I got on it using DMD's malloc):
1 sec, 125 ms, and 847 ╬╝s
1 sec, 125 ms, 741 ╬╝s, and 6 hnsecs

test tasks
556 ms, 613 ╬╝s, and 8 hnsecs
test tasks
552 ms and 287 ╬╝s

test parallel
554 ms, 542 ╬╝s, and 6 hnsecs
test parallel
551 ms, 514 ╬╝s, and 9 hnsecs


Unfortunately it doesn't compile with the ancient version of gdc available in Debian, so I couldn't test with that. The results should be quite a bit better since core.atomic would be faster. And frankly, I'm not sure if the allocator actually works properly, but it's just for testing purposes anyways.

Reply via email to