From: Dave Rodgman
> Sent: 30 November 2018 11:48
> From: Matt Sealey <[email protected]>
> 
> Most compilers should be able to merge adjacent loads/stores of sizes
> which are less than but effect a multiple of a machine word size (in
> effect a memcpy() of a constant amount). However the semantics of the
> macro are that it just does the copy, the pointer increment is in the
> code, hence we see
> 
>     *a = *b
>     a += 8
>     b += 8
>     *a = *b
>     a += 8
>     b += 8
> 
> This introduces a dependency between the two groups of statements which
> seems to defeat said compiler optimizers and generate some very strange
> sequences of addition and subtraction of address offsets (i.e. it is
> overcomplicated).
> 
> Since COPY8 is only ever used to copy amounts of 16 bytes (in pairs),
> just define COPY16 as COPY8,COPY8. We leave the definition to preserve
> the need to do unaligned accesses to machine-sized words per the
> original code intent, we just don't use it in the code proper.
> 
> COPY16 then gives us code like:
> 
>     *a = *b
>     *(a+8) = *(b+8)
>     a += 16
>     b += 16

You probably actually want:
        t1 = *b;
        t2 = *(b+8);
        *a = t1;
        *(a+8) = t2;
        a += 16;
        b += 16;

        David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Reply via email to