Quoting James Almer (2015-08-22 23:58:41) > On 22/08/15 1:16 PM, Anton Khirnov wrote: > >>> +%macro QPEL_8 2 > >>> +%if %2 > >>> + %define postfix v > >>> + %define mvfrac myq > >> > >> Same here and below the else, rename this to mvfracq and add a mvfracd. > >> > >>> + %define pixstride srcstrideq > >>> + %define pixstride3 sstride3q > >>> + %define src_m3 srcm3q > >>> +%else > >>> + %define postfix h > >>> + %define mvfrac mxq > >>> + %define pixstride 1 > >>> + %define pixstride3 3 > >>> + %define src_m3 (srcq - 3) > >>> +%endif > >>> + > >>> +cglobal hevc_qpel_ %+ postfix %+ _ %+ %1 %+ _8, 8, 10, 7, dst, > >>> dststride, src, srcstride, height, mx, my, sstride3, srcm3, coeffsreg > > This should be 7, 10, 7, Otherwise you're loading sstride3 from stack as if > it were > a function argument. > Ideally though, for vertical you'd use 5, 9, 7 then manually load either mx > or my > instead of both, saving one register, or even 5, 8, 7, since coeffsreg and > mvfrac > are only used during init, and you can easily reuse one of those two > registers for > sstride3 or srcm3. > You can also push it down to 4, 7, 7 if you manually load height before or > after > the SPLATWs and reuse the regs for coeffsreg and mvfrac. As a plus, this > would make > the functions work with x86_32. > > For horizontal you don't even need sstride3 or srcm3, so you definitely should > declare and use less registers. > > Didn't check other functions but I'm sure similar optimizations can be done. > > >>> +%if %2 > >>> + and mvfrac, 0x3 > >>> +%endif > >>> + dec mvfrac > >>> + shl mvfrac, 4 > >> > >> Use mvfracd on these three, it will clear the high bits for the mova below. > > > > anding the whole register with 3/7 should also work fine, with less > > clutter. > > "and mvfrac, 0x3" is only in ff_hevc_qpel_v_* functions, but not > ff_hevc_qpel_h_*. > It's the same with the "and mvfrac, 0x7" cases below.
Sure, I meant to change the code so it's done in both paths. > You need to use the d suffix > instead of q on the register names to make sure the high bits are cleared. Eh? Perhaps I'm misunderstading something, but I'd expect that using d here would do exactly the opposite and keep the random data in the high bits. -- Anton Khirnov _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
