On Wednesday 30 September 2015 13:19:02 Thiago Macieira wrote:
> On Wednesday 30 September 2015 13:05:23 Light, John J wrote:
> > There are 11 pointers, so on a 64-bit machine, we copy 88 bytes.
> 
> On a Sandybridge, that's 2x256-bit moves, one 128-bit move and one 64-bit
> move. Or 5x128-bit moves and one 64-bit one.

Or, when your compiler is smart, it can do even better than I had thought.

$ cat test.cpp
struct S { void *ptrs[11]; }; 
void f(S);                              // pass by value
void f(S *s) { f(*s); }         // will cause copy

$ clang -march=native -S -o - -O2 test.cpp
[assembly cut for relevance]
_Z1fP1S:                                # @_Z1fP1S
        vmovups (%rdi), %ymm0
        vmovups 32(%rdi), %ymm1
        vmovups 56(%rdi), %ymm2
        vmovups %ymm2, 56(%rsp)
        vmovups %ymm1, 32(%rsp)
        vmovups %ymm0, (%rsp)
        vzeroupper
        callq   _Z1f1S
        addq    $88, %rsp
        retq

That was 3 256-bit oads and stores. The compiler managed that by using 
overlapping loads and stores.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center

Reply via email to