Re: arm "neon"

Torbjorn Granlund Mon, 14 Jan 2013 09:41:00 -0800

ni...@lysator.liu.se (Niels Möller) writes:

  Torbjorn Granlund <t...@gmplib.org> writes:

  > I played with vmlal.u32 on A9 and A15.  Surprisingly, both CPUs are very
  > cooperative in that the accumulation dependency is very shallow.

  Nice. Is the same true for the non-simd umaal instruction?

Things are complicated, I cannot get my head around what's going on.


This example with partial overlapping runs at 12 cycles:

        umaal   r2, r1, r14, r14
        umaal   r2, r3, r14, r14
        umaal   r2, r5, r14, r14
        umaal   r2, r7, r14, r14

This example with 100% overlapping runs at 8 cycles:

        umaal   r2, r1, r14, r14
        umaal   r2, r1, r14, r14
        umaal   r2, r1, r14, r14
        umaal   r2, r1, r14, r14

THis example with partial overlapping runs at 16 cycles:

        umaal   r2, r1, r14, r14
        umaal   r4, r1, r14, r14
        umaal   r6, r1, r14, r14
        umaal   r8, r1, r14, r14

With completely independent operands, things run at 8 cycles.

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Re: arm "neon"

Reply via email to