Quoting James Almer (2015-08-22 23:58:41)
> On 22/08/15 1:16 PM, Anton Khirnov wrote:
> >>> +%macro QPEL_8 2
> >>> +%if %2
> >>> +    %define postfix    v
> >>> +    %define mvfrac     myq
> >>
> >> Same here and below the else, rename this to mvfracq and add a mvfracd.
> >>
> >>> +    %define pixstride  srcstrideq
> >>> +    %define pixstride3 sstride3q
> >>> +    %define src_m3     srcm3q
> >>> +%else
> >>> +    %define postfix    h
> >>> +    %define mvfrac     mxq
> >>> +    %define pixstride  1
> >>> +    %define pixstride3 3
> >>> +    %define src_m3     (srcq - 3)
> >>> +%endif
> >>> +
> >>> +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, 
> >>> dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg
> 
> This should be 7, 10, 7, Otherwise you're loading sstride3 from stack as if 
> it were
> a function argument.
> Ideally though, for vertical you'd use 5, 9, 7 then manually load either mx 
> or my
> instead of both, saving one register, or even 5, 8, 7, since coeffsreg and 
> mvfrac
> are only used during init, and you can easily reuse one of those two 
> registers for
> sstride3 or srcm3.
> You can also push it down to 4, 7, 7 if you manually load height before or 
> after
> the SPLATWs and reuse the regs for coeffsreg and mvfrac. As a plus, this 
> would make
> the functions work with x86_32.
> 
> For horizontal you don't even need sstride3 or srcm3, so you definitely should
> declare and use less registers.
> 
> Didn't check other functions but I'm sure similar optimizations can be done.
> 
> >>> +%if %2
> >>> +    and       mvfrac, 0x3
> >>> +%endif
> >>> +    dec       mvfrac
> >>> +    shl       mvfrac, 4
> >>
> >> Use mvfracd on these three, it will clear the high bits for the mova below.
> > 
> > anding the whole register with 3/7 should also work fine, with less
> > clutter.
> 
> "and mvfrac, 0x3" is only in ff_hevc_qpel_v_* functions, but not 
> ff_hevc_qpel_h_*.
> It's the same with the "and mvfrac, 0x7" cases below.

Sure, I meant to change the code so it's done in both paths.

> You need to use the d suffix
> instead of q on the register names to make sure the high bits are cleared.

Eh? Perhaps I'm misunderstading something, but I'd expect that using d
here would do exactly the opposite and keep the random data in the high bits.

-- 
Anton Khirnov
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to