On Monday, 4 March 2013 at 15:44:40 UTC, bearophile wrote:
John Colvin:
The performance of the multiplication loops and the
performance of the allocation are separate issues and should
be measured as such, especially if one wants to make
meaningful optimisations.
If you want to improve the D compiler, druntime, etc, then I
agree you have to separate the variables and test them one at a
time. But if you are comparing languages+runtimes+libraries
then it's better to not cheat, and test the whole running
(warmed) time.
Bye,
bearophile
I disagree. Information about which parts of the code are running
fast and which are running slow is critical to optimisation. If
you don't know whether it's the D memory allocation that's slow
or the D multiplication loops, you're trying to optimise
blindfolded.
Even if all your doing is a comparison, it's a lot more useful to
know *where* the slowdown is happening so that you can make a
meaningful analysis of the results.
Enter a strange example:
I found that malloced multi-dim arrays were considerably faster
to iterate over and assign to than D gc slices, even with the gc
disable after allocation and bounds checks turned off.
If I hadn't bothered to do separate timings of the allocation and
iteration, I would never have noticed this effect and instead
written it off as purely "malloc is faster at allocating than the
GC"