On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájková <[email protected]> wrote:
Some fairly minor nits: > +++ b/libavcodec/x86/hevc_idct.asm > +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp > + movsx tmpq, word [coeffq] > + add tmpw, ((1 << 14-%3) + 1) > + sar tmpw, (15-%3) > + movd xm0, tmpd Using dword instead of qword for the movsx gets rid of an unnecessary REX-prefix. Can the add overflow 16-bit, e.g. is the use of a 16-bit shift instead of a 32-bit one required for truncation? If not, use dword for all those instructions to prevent the possibility of partial register access stalls on some CPUs. [...] > +.loop: > + mova [coeffq+mmsize*0], m0 > + mova [coeffq+mmsize*1], m0 > + mova [coeffq+mmsize*2], m0 > + mova [coeffq+mmsize*3], m0 > + mova [coeffq+mmsize*4], m0 > + mova [coeffq+mmsize*5], m0 > + mova [coeffq+mmsize*6], m0 > + mova [coeffq+mmsize*7], m0 > + add coeffq, mmsize*8 > + dec cntd > + jg .loop Offsets in the range [-128,127] can be encoded in 1 byte whereas larger offsets require 4 bytes, and mmsize*4 is 128 when using ymm registers. The code size can therefore be slightly reduced by reordering instructions like this: mova [coeffq+mmsize*0], m0 mova [coeffq+mmsize*1], m0 mova [coeffq+mmsize*2], m0 mova [coeffq+mmsize*3], m0 add coeffq, mmsize*8 mova [coeffq+mmsize*-4], m0 mova [coeffq+mmsize*-3], m0 mova [coeffq+mmsize*-2], m0 mova [coeffq+mmsize*-1], m0 _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
