Torbjorn Granlund <t...@gmplib.org> writes:

> You mean 10 cycles per for one U limb multiplied by the 2 V limbs?
> Then 7/2 = 3.5 c/l is a good start.

Unfortunately not. speed -C -s ... mpn_addmul_2 reported around 14
cycles, so it's 7 c/l, compared to 2.38 for the current non-simd code.
If I interpret speed output correctly.

> What about SIMD multiply-accumulate?  IIRC, these insns have the same
> latency ate throughput as non-accumulating SIMD multiplies.

Should look into that (I didn't notice any useful integer
multiply-accumulate instructions on my first reading of the manual). But
I suspect you get them on the critical path, and then the relevant
comparison is to add latency, not mul latency.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to