On Wednesday 30 September 2015 13:19:02 Thiago Macieira wrote:
> On Wednesday 30 September 2015 13:05:23 Light, John J wrote:
> > There are 11 pointers, so on a 64-bit machine, we copy 88 bytes.
>
> On a Sandybridge, that's 2x256-bit moves, one 128-bit move and one 64-bit
> move. Or 5x128-bit moves and one 64-bit one.
Or, when your compiler is smart, it can do even better than I had thought.
$ cat test.cpp
struct S { void *ptrs[11]; };
void f(S); // pass by value
void f(S *s) { f(*s); } // will cause copy
$ clang -march=native -S -o - -O2 test.cpp
[assembly cut for relevance]
_Z1fP1S: # @_Z1fP1S
vmovups (%rdi), %ymm0
vmovups 32(%rdi), %ymm1
vmovups 56(%rdi), %ymm2
vmovups %ymm2, 56(%rsp)
vmovups %ymm1, 32(%rsp)
vmovups %ymm0, (%rsp)
vzeroupper
callq _Z1f1S
addq $88, %rsp
retq
That was 3 256-bit oads and stores. The compiler managed that by using
overlapping loads and stores.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel Open Source Technology Center