On 12/11/2011 8:48 AM, bearophile wrote:
This program used here as a benchmark is a bit reduced from a rosettacode task,
it finds how many ways are there to make change for 100_000 euro (the argument
'amount' represents cents of euro) using the common coins.
The result is:
992198221207406412424859964272600001
The timings, best of 3, seconds:
DMD: 22.5
Python: 9.3
Java: 2.9
DMD 2.057beta, -O -release -inline
Java SE 1.7.0_01-b08 (used without -server)
Python 2.6.6
32 bit Windows system.
The D version is about 7.7 times slower than the Java client version. I have
seen that most running time in the D code is spent in this line that performs
the BigInt sum:
table[pos] += table[pos - 1];
With some tests I think most of the run time of the D version is used by the
GC, so this is mostly a GC benchmark (so here the sum routines of D BigInts are
probably good enough). I think in this benchmark the low D GC performance is
not caused by the not precise nature of the D GC (because during the running
the amount of memory used is constantly 7.7 MB), but mostly by the amount of
garbage produced each second. So maybe it's the Young Generation of the Java GC
that is helping a lot here. But even the reference count GC of Python is more
efficient here.
I have not timed the GC performance with the recent experiments done by dsimcha
on the D GC discussed here, a timing value with those changes is welcome:
https://github.com/dsimcha/druntime/wiki/GC-Optimizations-Round-2
My optimizations make very little difference on this benchmark, but for
good reason: It's not a very good GC benchmark. I ran it with my GC
profiling code enabled and it only spends around 10% of its execution
time in GC. We need to figure out why else this benchmark may be so slow.
Furthermore, because this benchmark uses so little GC time, my
improvements are mostly buried in noise. Here are some sample runs on
2.057 beta with and without my GC improvements. However, I make no
claim that these results are statistically significant, as when I ran
the benchmark a few times the variance was quite high and I'm too lazy
to run it a zillion times and get a mean and variance for each stage, etc.
Without my improvements:
992198221207406412424859964272600001
Total GC prep time: 62 milliseconds
Total mark time: 456 milliseconds
Total sweep time: 1357 milliseconds
Total page recovery time: 439 milliseconds
Grand total GC time: 2314 milliseconds
Process returned 0 (0x0) execution time : 20.776 s
With my improvements:
992198221207406412424859964272600001
Total GC prep time: 16 milliseconds
Total mark time: 393 milliseconds
Total sweep time: 1049 milliseconds
Total page recovery time: 297 milliseconds
Grand total GC time: 1755 milliseconds
Process returned 0 (0x0) execution time : 19.287 s