I have already hit long division related speed issues in my D code. Sometimes simple things can dominate a benchmark, but those same simple things can dominate user code too!
Walter Bright Wrote: > Jarrett Billingsley wrote: > > I hope bearophile will eventually understand that DMD is not good at > > optimizing code, and so comparing its output to GCC's is ultimately > > meaningless. > > The long arithmetic benchmark is completely (and I mean completely) > dominated by the time spent in the long divide helper function. The > timing results for it really have nothing to do with the compiler > optimizer or code generator. Reducing the number of instructions in the > loop by one or improving pairing slightly does nothing when stacked up > against maybe 50 instructions in the long divide helper function. > > The long divide helper dmd uses (phobos\internal\llmath.d) is code I > basically wrote 25 years ago and have hardly looked at since except to > carry it forward. It uses the classic shift-and-subtract algorithm, but > there are better ways to do it now with the x86 instruction set. > > Time to have some fun doing hand-coded assembler again! > > Fixing this should bring that loop timing up to par, but it's still not > a good benchmark for a code generator. Coming up with good *code > generator* benchmarks is hard, and really can't be done without looking > at the assembler output to make sure that what you think is happening is > what is actually happening. > > I've seen a lot of benchmarks over the years, and too many of them do > things like measure malloc() or printf() speed instead of loop > optimizations or other intended measurements. Caching and alignment > issues can also dominate the results. > > I haven't looked closely at the other loop yet.