Torbjorn Granlund <t...@gmplib.org> writes: > You mean 10 cycles per for one U limb multiplied by the 2 V limbs? > Then 7/2 = 3.5 c/l is a good start.
Unfortunately not. speed -C -s ... mpn_addmul_2 reported around 14 cycles, so it's 7 c/l, compared to 2.38 for the current non-simd code. If I interpret speed output correctly. > What about SIMD multiply-accumulate? IIRC, these insns have the same > latency ate throughput as non-accumulating SIMD multiplies. Should look into that (I didn't notice any useful integer multiply-accumulate instructions on my first reading of the manual). But I suspect you get them on the critical path, and then the relevant comparison is to add latency, not mul latency. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel