On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájková
<[email protected]> wrote:

Some fairly minor nits:

> +++ b/libavcodec/x86/hevc_idct.asm

> +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp
> +    movsx             tmpq, word [coeffq]
> +    add               tmpw, ((1 << 14-%3) + 1)
> +    sar               tmpw, (15-%3)
> +    movd               xm0, tmpd

Using dword instead of qword for the movsx gets rid of an unnecessary
REX-prefix.

Can the add overflow 16-bit, e.g. is the use of a 16-bit shift instead
of a 32-bit one required for truncation? If not, use dword for all
those instructions to prevent the possibility of partial register
access stalls on some CPUs.

[...]

> +.loop:
> +    mova [coeffq+mmsize*0], m0
> +    mova [coeffq+mmsize*1], m0
> +    mova [coeffq+mmsize*2], m0
> +    mova [coeffq+mmsize*3], m0
> +    mova [coeffq+mmsize*4], m0
> +    mova [coeffq+mmsize*5], m0
> +    mova [coeffq+mmsize*6], m0
> +    mova [coeffq+mmsize*7], m0
> +    add  coeffq, mmsize*8
> +    dec  cntd
> +    jg  .loop

Offsets in the range [-128,127] can be encoded in 1 byte whereas
larger offsets require 4 bytes, and mmsize*4 is 128 when using ymm
registers. The code size can therefore be slightly reduced by
reordering instructions like this:

mova [coeffq+mmsize*0], m0
mova [coeffq+mmsize*1], m0
mova [coeffq+mmsize*2], m0
mova [coeffq+mmsize*3], m0
add  coeffq, mmsize*8
mova [coeffq+mmsize*-4], m0
mova [coeffq+mmsize*-3], m0
mova [coeffq+mmsize*-2], m0
mova [coeffq+mmsize*-1], m0
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to