The new measurement reporting pages have highlighted many improvement opportunities, and as you might have seen I've lately fixed a handful of the _basecase functions for x86-64.
An aspect not directly covered by the new measurement reporting is that the 32-bit and 64-bit performance-per-limb is comparable for each CPU, so I took a look at the timing tables manually. With current GMP code, it turns out that 32-bit performance-per-limb is typically worse or much worse for any semi modern CPU. For non-multiply functions, the performance disadvantage is not too bad, but >= 50% speedup is possible in most cases. For multiply, the speedup headroom is much greater, and since these already suffer quadratically in limb size compared to the 64-bit code, performance improvements here will be quite significant. Our latest batch of x86-32 code dates from 2011 (for the original Intel atom) but we have not done anything for high-end AMD and Intel CPUs (e.g., AMD k10, bulldozer, piledriver, steamroller, excavator, zen, or Intel penryn, nehalem, sandybridge, ivybridge, haswell, broadwell, skylake, kabylake) in a very long time. I took a quick look into what could be done with a limited effort. We currently use 64-bit sse2 code for some CPUs since that was optimal at the time. However, all above listed CPUs support full execution rate for 128-bit sse2 code, but we don't make use of that anyplace. Furthermore, Intel haswell, broadwell, skylake, kabylake have 256-bit execution of avx2. What do I have in mind? I believe pmovzxdq, pmuludq, psrlq (or some shuffle insn), and paddq could be used to build an addmul_2 which runs at at close to 1 cycle/limb using sse2, and with avx2 the corresponding instructions could build an addmul_4 running at not far from 0.5 cycle/limb. Any volunteer willing to give it a try? One might wonder whether sse2/avx2 couldn't also improve the x86-64 code. Unfortunately, they cannot. The absence of 64-bit SIMD multiply makes plain scalar ops far better. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list [email protected] https://gmplib.org/mailman/listinfo/gmp-devel
