ni...@lysator.liu.se (Niels Möller) writes: In Chapter 3, multiplication instructions listed in a table starting on page "3-14". But now I see I read the entry for a smaller data size. For 32-bit inputs, it's apparently 2 cycles, not 1. It seems to be 2 cycles indeed:
.text .globl main .type main, #function main: mov r0, #1006632960 1: subs r0, r0, #1 vmull.u32 q2, d0, d0 vmull.u32 q4, d0, d0 vmull.u32 q6, d0, d0 vmull.u32 q8, d0, d0 bne 1b mov pc, lr But IIUC, we are thus performing a 32 x 32 -> 64 mul per cycle. Can one stick addition here without consuming cycles? -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel