On 12/11/2011 8:48 AM, bearophile wrote:
This program used here as a benchmark is a bit reduced from a rosettacode task, 
it finds how many ways are there to make change for 100_000 euro (the argument 
'amount' represents cents of euro) using the common coins.

The result is:
992198221207406412424859964272600001

The timings, best of 3, seconds:
   DMD:    22.5
   Python:  9.3
   Java:    2.9

DMD 2.057beta, -O -release -inline
Java SE 1.7.0_01-b08 (used without -server)
Python 2.6.6
32 bit Windows system.

The D version is about 7.7 times slower than the Java client version. I have 
seen that most running time in the D code is spent in this line that performs 
the BigInt sum:

table[pos] += table[pos - 1];

With some tests I think most of the run time of the D version is used by the 
GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are 
probably good enough). I think in this benchmark the low D GC performance is 
not caused by the not precise nature of the D GC (because during the running 
the amount of memory used is constantly 7.7 MB), but mostly by the amount of 
garbage produced each second. So maybe it's the Young Generation of the Java GC 
that is helping a lot here. But even the reference count GC of Python is more 
efficient here.

I have not timed the GC performance with the recent experiments done by dsimcha 
on the D GC discussed here, a timing value with those changes is welcome:
https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2

My optimizations make very little difference on this benchmark, but for good reason: It's not a very good GC benchmark. I ran it with my GC profiling code enabled and it only spends around 10% of its execution time in GC. We need to figure out why else this benchmark may be so slow.

Furthermore, because this benchmark uses so little GC time, my improvements are mostly buried in noise. Here are some sample runs on 2.057 beta with and without my GC improvements. However, I make no claim that these results are statistically significant, as when I ran the benchmark a few times the variance was quite high and I'm too lazy to run it a zillion times and get a mean and variance for each stage, etc.

Without my improvements:
992198221207406412424859964272600001
        Total GC prep time:  62 milliseconds
        Total mark time:  456 milliseconds
        Total sweep time:  1357 milliseconds
        Total page recovery time:  439 milliseconds
        Grand total GC time:  2314 milliseconds

Process returned 0 (0x0)   execution time : 20.776 s

With my improvements:
992198221207406412424859964272600001
        Total GC prep time:  16 milliseconds
        Total mark time:  393 milliseconds
        Total sweep time:  1049 milliseconds
        Total page recovery time:  297 milliseconds
        Grand total GC time:  1755 milliseconds

Process returned 0 (0x0)   execution time : 19.287 s

Reply via email to