Jarrett Billingsley wrote:
I hope bearophile will eventually understand that DMD is not good at
optimizing code, and so comparing its output to GCC's is ultimately
meaningless.

The long arithmetic benchmark is completely (and I mean completely) dominated by the time spent in the long divide helper function. The timing results for it really have nothing to do with the compiler optimizer or code generator. Reducing the number of instructions in the loop by one or improving pairing slightly does nothing when stacked up against maybe 50 instructions in the long divide helper function.

The long divide helper dmd uses (phobos\internal\llmath.d) is code I basically wrote 25 years ago and have hardly looked at since except to carry it forward. It uses the classic shift-and-subtract algorithm, but there are better ways to do it now with the x86 instruction set.

Time to have some fun doing hand-coded assembler again!

Fixing this should bring that loop timing up to par, but it's still not a good benchmark for a code generator. Coming up with good *code generator* benchmarks is hard, and really can't be done without looking at the assembler output to make sure that what you think is happening is what is actually happening.

I've seen a lot of benchmarks over the years, and too many of them do things like measure malloc() or printf() speed instead of loop optimizations or other intended measurements. Caching and alignment issues can also dominate the results.

I haven't looked closely at the other loop yet.

Reply via email to