On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock <[email protected]> wrote:
> +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride
> + pxor m0, m0
> + lea r3, [strideq * 3]
> + RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq
> + RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> + mov r4d, 3
> +.loop:
> + add coeffsq, 128
> + lea dstq, [dstq + strideq * 4]
> + RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq
> + RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> + dec r4d
> + jnz .loop
> + RET
You can do all iterations within the loop instead, e.g. something like:
mov r4d, 4
.loop:
RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq
RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
add coeffsq, 128
lea dstq, [dstq + strideq * 4]
dec r4d
jnz .loop
(the same applies to all other similar functions)
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel