t...@gmplib.org (Torbjörn Granlund) writes: > Our latest batch of x86-32 code dates from 2011 (for the original Intel > atom) but we have not done anything for high-end AMD and Intel CPUs > (e.g., AMD k10, bulldozer, piledriver, steamroller, excavator, zen, or > Intel penryn, nehalem, sandybridge, ivybridge, haswell, broadwell, > skylake, kabylake) in a very long time.
When are those later cpus run in 32-bit mode? M$ windows or mac applications? I would have expected 64_64 mode, possibly with some use of the x32 abi (small pointers), to be used almost exclusively by now. > What do I have in mind? I believe pmovzxdq, pmuludq, psrlq (or some > shuffle insn), and paddq could be used to build an addmul_2 which runs > at at close to 1 cycle/limb using sse2, I think I looked at pmuludq in the past, the variant doing two 32x32->64 multiplies, without having any success. IIRC, the throughput of that instruction on then current cpus was too poor to make it useful. Other possible reasons for failure: (i) I didn't try hard enough, (ii) too much shuffling around of the operands are needed. BTW, speaking of addmul_2. Where current addmul_2 wins over addmul_1, that's because we get more independent mul instructions and can more easily saturate multiplication units. At least, that's my understanding. We've considered using karatsuba aka toom2 for addmul_2, but it has always turned out that saving 1/4 of the multiply instructions is very easily eaten up by the additional operations needed. But the other day, it striked me that we might also try doing addmul_2 using toom32, which would save 1/3 of the mul instructions. Toom32 is nice because we can use the four easiest evaluation points: 0, infinity, and +/-1. Or addmul_3 using toom32, which has the additional advantage that more of the evaluation work is loop-invariant, and we could also jump to separate innerloops depending on the carry bits from evaluation. Perhaps this is still crazy, and useful only for machines with very slow multiplication. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel